* [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer)
2008-05-29 22:22 [patch 00/12] fake ACPI C2 emulation v2 Marcelo Tosatti
@ 2008-05-29 22:22 ` Marcelo Tosatti
2008-06-01 16:34 ` Thomas Gleixner
2008-05-29 22:22 ` [patch 02/12] KVM: allow multiple IO bitmap pages, provide userspace interface Marcelo Tosatti
` (11 subsequent siblings)
12 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2008-05-29 22:22 UTC (permalink / raw)
To: Avi Kivity
Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm,
Marcelo Tosatti, john stultz, Thomas Gleixner
[-- Attachment #1: pmtimer-dev --]
[-- Type: text/plain, Size: 4032 bytes --]
KVM wishes to allow direct guest access to the ACPI pmtimer. In that
case QEMU/KVM has to read the current value for migration, so the proper
syncing can be done on the destination.
This patch will not register the device if the chipset has an unreliable
timer.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
CC: john stultz <johnstul@us.ibm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
Index: kvm/drivers/char/Kconfig
===================================================================
--- kvm.orig/drivers/char/Kconfig
+++ kvm/drivers/char/Kconfig
@@ -1057,6 +1057,13 @@ config HPET_MMAP
exposed to the user. If this applies to your hardware,
say N here.
+config ACPI_PMTIMER_DEV
+ tristate "ACPI PM-Timer device"
+ default n
+ depends on ACPI && X86_PM_TIMER
+ help
+ Allow userspace to read the ACPI PM-Timer value.
+
config HANGCHECK_TIMER
tristate "Hangcheck timer"
depends on X86 || IA64 || PPC64 || S390
Index: kvm/drivers/char/Makefile
===================================================================
--- kvm.orig/drivers/char/Makefile
+++ kvm/drivers/char/Makefile
@@ -112,6 +112,7 @@ obj-$(CONFIG_PS3_FLASH) += ps3flash.o
obj-$(CONFIG_JS_RTC) += js-rtc.o
js-rtc-y = rtc.o
+obj-$(CONFIG_ACPI_PMTIMER_DEV) += acpi_pmtimer.o
# Files generated that shall be removed upon make clean
clean-files := consolemap_deftbl.c defkeymap.c
Index: kvm/drivers/char/acpi_pmtimer.c
===================================================================
--- /dev/null
+++ kvm/drivers/char/acpi_pmtimer.c
@@ -0,0 +1,61 @@
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <linux/acpi_pmtmr.h>
+
+#include <asm/io.h>
+#include <asm/uaccess.h>
+
+static ssize_t pmtimer_read(struct file *file, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ int ret;
+ __u32 value;
+
+ ret = -EINVAL;
+ if (count < sizeof(u32))
+ goto out;
+
+ value = inl(pmtmr_ioport) & ACPI_PM_MASK;
+
+ ret = -EFAULT;
+ if (put_user(value, (u32 __user *)buf))
+ goto out;
+
+ ret = sizeof(value);
+out:
+ return ret;
+}
+
+static const struct file_operations acpi_pmtimer_fops = {
+ .owner = THIS_MODULE,
+ .read = pmtimer_read,
+};
+
+static struct miscdevice pmtimer_miscdev = {
+ MISC_DYNAMIC_MINOR,
+ "pmtimer",
+ &acpi_pmtimer_fops,
+};
+
+static int __init pmtimer_init(void)
+{
+ if (!pmtmr_ioport || !pmtimer_is_reliable())
+ return -ENODEV;
+
+ return misc_register(&pmtimer_miscdev);
+}
+
+static void __exit pmtimer_exit(void)
+{
+ if (pmtmr_ioport && pmtimer_is_reliable())
+ misc_deregister(&pmtimer_miscdev);
+}
+
+module_init(pmtimer_init);
+module_exit(pmtimer_exit);
+MODULE_AUTHOR ("Marcelo Tosatti <mtosatti@redhat.com>");
+MODULE_DESCRIPTION("ACPI PM-Timer");
+MODULE_LICENSE ("GPL");
+
Index: kvm/drivers/clocksource/acpi_pm.c
===================================================================
--- kvm.orig/drivers/clocksource/acpi_pm.c
+++ kvm/drivers/clocksource/acpi_pm.c
@@ -30,6 +30,8 @@
*/
u32 pmtmr_ioport __read_mostly;
+static int reliable_pmtimer;
+
static inline u32 read_pmtmr(void)
{
/* mask the output to 24 bits */
@@ -208,10 +210,21 @@ pm_good:
if (verify_pmtmr_rate() != 0)
return -ENODEV;
+ if (clocksource_acpi_pm.read == acpi_pm_read)
+ reliable_pmtimer = 1;
+
return clocksource_register(&clocksource_acpi_pm);
}
+int pmtimer_is_reliable(void)
+{
+ return reliable_pmtimer;
+}
+
/* We use fs_initcall because we want the PCI fixups to have run
* but we still need to load before device_initcall
*/
fs_initcall(init_acpi_pm_clocksource);
+
+EXPORT_SYMBOL(pmtmr_ioport);
+EXPORT_SYMBOL(pmtimer_is_reliable);
Index: kvm/include/linux/acpi_pmtmr.h
===================================================================
--- kvm.orig/include/linux/acpi_pmtmr.h
+++ kvm/include/linux/acpi_pmtmr.h
@@ -27,6 +27,8 @@ static inline u32 acpi_pm_read_early(voi
extern void pmtimer_wait(unsigned);
+int pmtimer_is_reliable(void);
+
#else
static inline u32 acpi_pm_read_early(void)
--
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer)
2008-05-29 22:22 ` [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer) Marcelo Tosatti
@ 2008-06-01 16:34 ` Thomas Gleixner
2008-06-01 16:56 ` Anthony Liguori
2008-06-01 17:56 ` Marcelo Tosatti
0 siblings, 2 replies; 27+ messages in thread
From: Thomas Gleixner @ 2008-06-01 16:34 UTC (permalink / raw)
To: Marcelo Tosatti
Cc: Avi Kivity, Chris Wright, Glauber Costa, Anthony Liguori, kvm,
john stultz
On Thu, 29 May 2008, Marcelo Tosatti wrote:
> KVM wishes to allow direct guest access to the ACPI pmtimer. In that
> case QEMU/KVM has to read the current value for migration, so the proper
> syncing can be done on the destination.
I don't understand from the above which problem you are trying to
solve. Which pmtimer is read out, the one of the host (physical
hardware) or the one of the guest (emulated hardware) ? What is synced
at the destination ?
> This patch will not register the device if the chipset has an unreliable
> timer.
Can we please keep that code inside of drivers/clocksource/acpi_pm.c
without creating a new disconnected file in drivers/char ?
Btw, depending on the use case we might as well have a sysfs entry for that.
> +static ssize_t pmtimer_read(struct file *file, char __user *buf, size_t count,
> + loff_t *ppos)
> +{
> + int ret;
> + __u32 value;
> +
> + ret = -EINVAL;
> + if (count < sizeof(u32))
> + goto out;
return -EINVAL;
> +
> + value = inl(pmtmr_ioport) & ACPI_PM_MASK;
> +
> + ret = -EFAULT;
> + if (put_user(value, (u32 __user *)buf))
> + goto out;
return -EFAULT;
> + ret = sizeof(value);
return sizeof(value);
> +out:
> + return ret;
> +}
> Index: kvm/drivers/clocksource/acpi_pm.c
> ===================================================================
> --- kvm.orig/drivers/clocksource/acpi_pm.c
> +++ kvm/drivers/clocksource/acpi_pm.c
> @@ -30,6 +30,8 @@
> */
> u32 pmtmr_ioport __read_mostly;
>
> +static int reliable_pmtimer;
> +
> static inline u32 read_pmtmr(void)
> {
> /* mask the output to 24 bits */
> @@ -208,10 +210,21 @@ pm_good:
> if (verify_pmtmr_rate() != 0)
> return -ENODEV;
>
> + if (clocksource_acpi_pm.read == acpi_pm_read)
> + reliable_pmtimer = 1;
> +
> return clocksource_register(&clocksource_acpi_pm);
> }
>
> +int pmtimer_is_reliable(void)
> +{
> + return reliable_pmtimer;
return clocksource_acpi_pm.read == acpi_pm_read;
So we don't need reliable_pmtimer at all.
Thanks,
tglx
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer)
2008-06-01 16:34 ` Thomas Gleixner
@ 2008-06-01 16:56 ` Anthony Liguori
2008-06-04 9:53 ` Avi Kivity
2008-06-01 17:56 ` Marcelo Tosatti
1 sibling, 1 reply; 27+ messages in thread
From: Anthony Liguori @ 2008-06-01 16:56 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Marcelo Tosatti, Avi Kivity, Chris Wright, Glauber Costa, kvm,
john stultz
Thomas Gleixner wrote:
> Can we please keep that code inside of drivers/clocksource/acpi_pm.c
> without creating a new disconnected file in drivers/char ?
>
> Btw, depending on the use case we might as well have a sysfs entry for that.
I think sysfs would actually make a lot of sense for this.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer)
2008-06-01 16:56 ` Anthony Liguori
@ 2008-06-04 9:53 ` Avi Kivity
2008-06-04 10:01 ` Thomas Gleixner
0 siblings, 1 reply; 27+ messages in thread
From: Avi Kivity @ 2008-06-04 9:53 UTC (permalink / raw)
To: Anthony Liguori
Cc: Thomas Gleixner, Marcelo Tosatti, Chris Wright, Glauber Costa,
kvm, john stultz
Anthony Liguori wrote:
> Thomas Gleixner wrote:
>> Can we please keep that code inside of drivers/clocksource/acpi_pm.c
>> without creating a new disconnected file in drivers/char ?
>>
>> Btw, depending on the use case we might as well have a sysfs entry
>> for that.
>
> I think sysfs would actually make a lot of sense for this.
>
It's read many thousands of times per second. You don't want a
read()/sprintf()/atoi() sequence every time.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer)
2008-06-04 9:53 ` Avi Kivity
@ 2008-06-04 10:01 ` Thomas Gleixner
2008-06-04 10:35 ` Avi Kivity
0 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2008-06-04 10:01 UTC (permalink / raw)
To: Avi Kivity
Cc: Anthony Liguori, Marcelo Tosatti, Chris Wright, Glauber Costa,
kvm, john stultz
On Wed, 4 Jun 2008, Avi Kivity wrote:
> Anthony Liguori wrote:
> > Thomas Gleixner wrote:
> > > Can we please keep that code inside of drivers/clocksource/acpi_pm.c
> > > without creating a new disconnected file in drivers/char ?
> > >
> > > Btw, depending on the use case we might as well have a sysfs entry for
> > > that.
> >
> > I think sysfs would actually make a lot of sense for this.
> >
>
> It's read many thousands of times per second. You don't want a
> read()/sprintf()/atoi() sequence every time.
Eek, according to Andrea it's only used for migration purpose.
Thanks,
tglx
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer)
2008-06-04 10:01 ` Thomas Gleixner
@ 2008-06-04 10:35 ` Avi Kivity
0 siblings, 0 replies; 27+ messages in thread
From: Avi Kivity @ 2008-06-04 10:35 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Anthony Liguori, Marcelo Tosatti, Chris Wright, Glauber Costa,
kvm, john stultz
Thomas Gleixner wrote:
> On Wed, 4 Jun 2008, Avi Kivity wrote:
>
>> Anthony Liguori wrote:
>>
>>> Thomas Gleixner wrote:
>>>
>>>> Can we please keep that code inside of drivers/clocksource/acpi_pm.c
>>>> without creating a new disconnected file in drivers/char ?
>>>>
>>>> Btw, depending on the use case we might as well have a sysfs entry for
>>>> that.
>>>>
>>> I think sysfs would actually make a lot of sense for this.
>>>
>>>
>> It's read many thousands of times per second. You don't want a
>> read()/sprintf()/atoi() sequence every time.
>>
>
> Eek, according to Andrea it's only used for migration purpose.
>
Oh, right. We also emulate pmtimer in qemu but it shouldn't need to
read the host pmtimer.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer)
2008-06-01 16:34 ` Thomas Gleixner
2008-06-01 16:56 ` Anthony Liguori
@ 2008-06-01 17:56 ` Marcelo Tosatti
2008-06-01 18:17 ` Thomas Gleixner
2008-06-02 16:43 ` John Stultz
1 sibling, 2 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2008-06-01 17:56 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Avi Kivity, Chris Wright, Glauber Costa, Anthony Liguori, kvm,
john stultz
On Sun, Jun 01, 2008 at 06:34:27PM +0200, Thomas Gleixner wrote:
> On Thu, 29 May 2008, Marcelo Tosatti wrote:
> > KVM wishes to allow direct guest access to the ACPI pmtimer. In that
> > case QEMU/KVM has to read the current value for migration, so the proper
> > syncing can be done on the destination.
>
> I don't understand from the above which problem you are trying to
> solve. Which pmtimer is read out, the one of the host (physical
> hardware) or the one of the guest (emulated hardware) ? What is synced
> at the destination ?
Problem is this:
We want to allow guests to directly access the hosts pmtimer (by using
the I/O bitmap feature in VMX/SVM hardware). The advantage of doing it
is that no VMExits are necessary for guest pmtimer reads (which happen
often if we inform the guest that ACPI C1 state is supported, or if the
workload is gettimeofday() intensive).
If you migrate such a guest that has direct (ie. non-virtualized, using
the physical hardware) pmtimer access to a different host (destination),
you need to save the current host pmtimer value at the time of migration
so that you can either emulate it with a proper offset or synchronize
(wait for the destination hosts real hardware pmtimer value to be in
sync before actually resuming guest execution).
> > This patch will not register the device if the chipset has an unreliable
> > timer.
>
> Can we please keep that code inside of drivers/clocksource/acpi_pm.c
> without creating a new disconnected file in drivers/char ?
>
> Btw, depending on the use case we might as well have a sysfs entry for that.
A sysfs entry sounds fine and much simpler. Should probably be a generic
clocksource interface (so userspace can read any available clocksource)
rather than acpi_pm specific.
<snip>
> return clocksource_acpi_pm.read == acpi_pm_read;
>
> So we don't need reliable_pmtimer at all.
For KVM's use case, we'd rather not allow direct pmtimer access if the
host has an unreliable (buggy) chipset. But then, I doubt any of those
older affected chipsets have HW virtualization support, so it shouldnt
be an issue.
Thanks!
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer)
2008-06-01 17:56 ` Marcelo Tosatti
@ 2008-06-01 18:17 ` Thomas Gleixner
2008-06-02 16:43 ` John Stultz
1 sibling, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2008-06-01 18:17 UTC (permalink / raw)
To: Marcelo Tosatti
Cc: Avi Kivity, Chris Wright, Glauber Costa, Anthony Liguori, kvm,
john stultz
On Sun, 1 Jun 2008, Marcelo Tosatti wrote:
> On Sun, Jun 01, 2008 at 06:34:27PM +0200, Thomas Gleixner wrote:
>
> A sysfs entry sounds fine and much simpler. Should probably be a generic
> clocksource interface (so userspace can read any available clocksource)
> rather than acpi_pm specific.
Agreed.
> > return clocksource_acpi_pm.read == acpi_pm_read;
> >
> > So we don't need reliable_pmtimer at all.
>
> For KVM's use case, we'd rather not allow direct pmtimer access if the
> host has an unreliable (buggy) chipset.
well, "return clocksource_acpi_pm.read == acpi_pm_read;" is supposed
to do that just without an additional variable "reliable_pmtimer" :)
> But then, I doubt any of those older affected chipsets have HW
> virtualization support, so it shouldnt be an issue.
It's exactly one old crappy chipset, which definitely has no HW virt
support and therefor we just can use read_pmtmr() w/o checking for
reliable or not.
Thanks,
tglx
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer)
2008-06-01 17:56 ` Marcelo Tosatti
2008-06-01 18:17 ` Thomas Gleixner
@ 2008-06-02 16:43 ` John Stultz
2008-06-03 4:09 ` Marcelo Tosatti
1 sibling, 1 reply; 27+ messages in thread
From: John Stultz @ 2008-06-02 16:43 UTC (permalink / raw)
To: Marcelo Tosatti
Cc: Thomas Gleixner, Avi Kivity, Chris Wright, Glauber Costa,
Anthony Liguori, kvm
On Sun, 2008-06-01 at 14:56 -0300, Marcelo Tosatti wrote:
> On Sun, Jun 01, 2008 at 06:34:27PM +0200, Thomas Gleixner wrote:
> > On Thu, 29 May 2008, Marcelo Tosatti wrote:
> > > KVM wishes to allow direct guest access to the ACPI pmtimer. In that
> > > case QEMU/KVM has to read the current value for migration, so the proper
> > > syncing can be done on the destination.
> >
> > I don't understand from the above which problem you are trying to
> > solve. Which pmtimer is read out, the one of the host (physical
> > hardware) or the one of the guest (emulated hardware) ? What is synced
> > at the destination ?
>
> Problem is this:
>
> We want to allow guests to directly access the hosts pmtimer (by using
> the I/O bitmap feature in VMX/SVM hardware). The advantage of doing it
> is that no VMExits are necessary for guest pmtimer reads (which happen
> often if we inform the guest that ACPI C1 state is supported, or if the
> workload is gettimeofday() intensive).
>
> If you migrate such a guest that has direct (ie. non-virtualized, using
> the physical hardware) pmtimer access to a different host (destination),
> you need to save the current host pmtimer value at the time of migration
> so that you can either emulate it with a proper offset or synchronize
> (wait for the destination hosts real hardware pmtimer value to be in
> sync before actually resuming guest execution)
I'm a little wary on this, another thing to catch here as well is host
suspend-resume cycles that might reset the pmtimer.
> > > This patch will not register the device if the chipset has an unreliable
> > > timer.
> >
> > Can we please keep that code inside of drivers/clocksource/acpi_pm.c
> > without creating a new disconnected file in drivers/char ?
> >
> > Btw, depending on the use case we might as well have a sysfs entry for that.
>
> A sysfs entry sounds fine and much simpler. Should probably be a generic
> clocksource interface (so userspace can read any available clocksource)
> rather than acpi_pm specific.
Again, I'd be hesitant to expose this stuff to userland since if the
counters reset (such as in the suspend/resume case) the applications may
not be aware.
And if its a generic interface, we would then have to also export
frequency and mask values. It just gets messy, so I'd avoid doing
anything generic in exporting clocksources (since either userland wants
specific hardware and is aware of all the known troubles it may have, or
userland should use the existing kernel time interfaces).
thanks
-john
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer)
2008-06-02 16:43 ` John Stultz
@ 2008-06-03 4:09 ` Marcelo Tosatti
0 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2008-06-03 4:09 UTC (permalink / raw)
To: John Stultz
Cc: Thomas Gleixner, Avi Kivity, Chris Wright, Glauber Costa,
Anthony Liguori, kvm
On Mon, Jun 02, 2008 at 09:43:20AM -0700, John Stultz wrote:
> > If you migrate such a guest that has direct (ie. non-virtualized, using
> > the physical hardware) pmtimer access to a different host (destination),
> > you need to save the current host pmtimer value at the time of migration
> > so that you can either emulate it with a proper offset or synchronize
> > (wait for the destination hosts real hardware pmtimer value to be in
> > sync before actually resuming guest execution)
>
> I'm a little wary on this, another thing to catch here as well is host
> suspend-resume cycles that might reset the pmtimer.
In that case (host resume from S-state) we can hold guest execution
until the real hw timer is in proximity to the guests expectation, or
fallback to emulation (but its not unsolvable).
This problem can happen now with the TSC since kvm's suspend routine
isnt saving it.
Thanks for the reminder.
> > > > This patch will not register the device if the chipset has an unreliable
> > > > timer.
> > >
> > > Can we please keep that code inside of drivers/clocksource/acpi_pm.c
> > > without creating a new disconnected file in drivers/char ?
> > >
> > > Btw, depending on the use case we might as well have a sysfs entry for that.
> >
> > A sysfs entry sounds fine and much simpler. Should probably be a generic
> > clocksource interface (so userspace can read any available clocksource)
> > rather than acpi_pm specific.
>
> Again, I'd be hesitant to expose this stuff to userland since if the
> counters reset (such as in the suspend/resume case) the applications may
> not be aware.
>
> And if its a generic interface, we would then have to also export
> frequency and mask values. It just gets messy, so I'd avoid doing
> anything generic in exporting clocksources (since either userland wants
> specific hardware and is aware of all the known troubles it may have, or
> userland should use the existing kernel time interfaces).
Good point. Will go for the acpi_pm's private sysfs file.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [patch 02/12] KVM: allow multiple IO bitmap pages, provide userspace interface
2008-05-29 22:22 [patch 00/12] fake ACPI C2 emulation v2 Marcelo Tosatti
2008-05-29 22:22 ` [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer) Marcelo Tosatti
@ 2008-05-29 22:22 ` Marcelo Tosatti
2008-05-29 22:22 ` [patch 03/12] KVM: allow userspace to open access to ACPI pmtimer Marcelo Tosatti
` (10 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2008-05-29 22:22 UTC (permalink / raw)
To: Avi Kivity
Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm,
Marcelo Tosatti
[-- Attachment #1: kvm-open-pmtimer --]
[-- Type: text/plain, Size: 20262 bytes --]
Allow multiple IO bitmap pages and provide an interface for userspace
to control which ports are open for a particular guest.
Only tested on Intel VMX.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm/arch/x86/kvm/kvm_svm.h
===================================================================
--- kvm.orig/arch/x86/kvm/kvm_svm.h
+++ kvm/arch/x86/kvm/kvm_svm.h
@@ -41,6 +41,7 @@ struct vcpu_svm {
unsigned long host_dr7;
u32 *msrpm;
+ struct kvm_io_bitmap_page *io_bitmap_page;
};
#endif
Index: kvm/arch/x86/kvm/svm.c
===================================================================
--- kvm.orig/arch/x86/kvm/svm.c
+++ kvm/arch/x86/kvm/svm.c
@@ -24,6 +24,7 @@
#include <linux/vmalloc.h>
#include <linux/highmem.h>
#include <linux/sched.h>
+#include <linux/kref.h>
#include <asm/desc.h>
@@ -68,7 +69,10 @@ static inline struct vcpu_svm *to_svm(st
return container_of(vcpu, struct vcpu_svm, vcpu);
}
-static unsigned long iopm_base;
+static LIST_HEAD(io_bitmap_pages);
+static DEFINE_MUTEX(io_bitmap_mutex);
+
+static struct kvm_io_bitmap_page *main_iopm_page;
struct kvm_ldttss_desc {
u16 limit0;
@@ -413,18 +417,28 @@ static __init int svm_hardware_setup(voi
{
int cpu;
struct page *iopm_pages;
+ struct kvm_io_bitmap_page *bitmap_page;
void *iopm_va;
int r;
- iopm_pages = alloc_pages(GFP_KERNEL, IOPM_ALLOC_ORDER);
+ bitmap_page = kmalloc(sizeof(struct kvm_io_bitmap_page), GFP_KERNEL);
+ if (!bitmap_page)
+ return -ENOMEM;
+ kref_init(&bitmap_page->ref);
+ INIT_LIST_HEAD(&bitmap_page->io_bitmap_list);
- if (!iopm_pages)
+ iopm_pages = alloc_pages(GFP_KERNEL, IOPM_ALLOC_ORDER);
+ if (!iopm_pages) {
+ kfree(bitmap_page);
return -ENOMEM;
+ }
iopm_va = page_address(iopm_pages);
memset(iopm_va, 0xff, PAGE_SIZE * (1 << IOPM_ALLOC_ORDER));
clear_bit(0x80, iopm_va); /* allow direct access to PC debug port */
- iopm_base = page_to_pfn(iopm_pages) << PAGE_SHIFT;
+ bitmap_page->page = iopm_pages;
+ list_add(&bitmap_page->io_bitmap_list, &io_bitmap_pages);
+ main_iopm_page = bitmap_page;
if (boot_cpu_has(X86_FEATURE_NX))
kvm_enable_efer_bits(EFER_NX);
@@ -454,14 +468,16 @@ static __init int svm_hardware_setup(voi
err:
__free_pages(iopm_pages, IOPM_ALLOC_ORDER);
- iopm_base = 0;
+ list_del(&bitmap_page->io_bitmap_list);
+ kfree(bitmap_page);
return r;
}
static __exit void svm_hardware_unsetup(void)
{
- __free_pages(pfn_to_page(iopm_base >> PAGE_SHIFT), IOPM_ALLOC_ORDER);
- iopm_base = 0;
+ __free_pages(main_iopm_page->page, IOPM_ALLOC_ORDER);
+ list_del(&main_iopm_page->io_bitmap_list);
+ kfree(main_iopm_page);
}
static void init_seg(struct vmcb_seg *seg)
@@ -534,7 +550,7 @@ static void init_vmcb(struct vcpu_svm *s
(1ULL << INTERCEPT_MONITOR) |
(1ULL << INTERCEPT_MWAIT);
- control->iopm_base_pa = iopm_base;
+ control->iopm_base_pa = page_to_pfn(svm->io_bitmap_page->page) << PAGE_SHIFT;
control->msrpm_base_pa = __pa(svm->msrpm);
control->tsc_offset = 0;
control->int_ctl = V_INTR_MASKING_MASK;
@@ -641,6 +657,8 @@ static struct kvm_vcpu *svm_create_vcpu(
svm->msrpm = page_address(msrpm_pages);
svm_vcpu_init_msrpm(svm->msrpm);
+ svm->io_bitmap_page = kvm->arch.io_bitmap_a;
+
svm->vmcb = page_address(page);
clear_page(svm->vmcb);
svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
@@ -1912,6 +1930,98 @@ static int get_npt_level(void)
#endif
}
+static struct kvm_io_bitmap_page *svm_find_io_bitmap_page(void *va)
+{
+ struct kvm_io_bitmap_page *bitmap_page;
+ struct page *iopm_pages;
+
+ list_for_each_entry(bitmap_page, &io_bitmap_pages, io_bitmap_list) {
+ bool found;
+ found = !memcmp(va, page_address(bitmap_page->page),
+ PAGE_SIZE * (1 << IOPM_ALLOC_ORDER));
+ if (found) {
+ kref_get(&bitmap_page->ref);
+ return bitmap_page;
+ }
+ }
+
+ bitmap_page = kmalloc(sizeof(struct kvm_io_bitmap_page), GFP_KERNEL);
+ if (!bitmap_page)
+ return NULL;
+
+ iopm_pages = alloc_pages(GFP_KERNEL, IOPM_ALLOC_ORDER);
+ if (!iopm_pages) {
+ kfree(bitmap_page);
+ return NULL;
+ }
+ memcpy(page_address(iopm_pages), va, (1 << IOPM_ALLOC_ORDER));
+ bitmap_page->page = iopm_pages;
+ kref_init(&bitmap_page->ref);
+ INIT_LIST_HEAD(&bitmap_page->io_bitmap_list);
+ list_add(&bitmap_page->io_bitmap_list, &io_bitmap_pages);
+
+ return bitmap_page;
+}
+
+static void svm_build_io_bitmap(void *va, struct kvm_ioport_list *ioports)
+{
+ int i;
+
+ memset(va, 0xff, PAGE_SIZE * (1 << IOPM_ALLOC_ORDER));
+ for (i = 0; i < ioports->nranges; i++) {
+ struct kvm_ioport *ioport = &ioports->ioports[i];
+ __u32 addr, n;
+
+ addr = ioport->addr;
+ for (n = addr; n < addr + ioport->len; n++)
+ clear_bit(n, va);
+ }
+}
+
+static int svm_open_io_ports(struct kvm *kvm, struct kvm_ioport_list *ioports)
+{
+ void *area;
+ struct kvm_io_bitmap_page *bitmap_page;
+ int ret = -ENOMEM;
+
+ mutex_lock(&io_bitmap_mutex);
+ area = vmalloc(PAGE_SIZE * (1 << IOPM_ALLOC_ORDER));
+ if (!area)
+ goto out_unlock;
+ svm_build_io_bitmap(area, ioports);
+ bitmap_page = svm_find_io_bitmap_page(area);
+ vfree(area);
+ if (!bitmap_page)
+ goto out_unlock;
+
+ kvm->arch.io_bitmap_a = bitmap_page;
+ ret = 0;
+out_unlock:
+ mutex_unlock(&io_bitmap_mutex);
+ return ret;
+}
+
+static void svm_io_bitmap_kref_put(struct kref *kref)
+{
+ struct kvm_io_bitmap_page *page =
+ container_of(kref, struct kvm_io_bitmap_page, ref);
+
+ mutex_lock(&io_bitmap_mutex);
+ list_del(&page->io_bitmap_list);
+ __free_pages(page->page, IOPM_ALLOC_ORDER);
+ kfree(page);
+ mutex_unlock(&io_bitmap_mutex);
+}
+
+static void svm_release_io_bitmaps(struct kvm *kvm)
+{
+ struct kvm_io_bitmap_page *io_bitmap;
+
+ io_bitmap = kvm->arch.io_bitmap_a;
+ if (io_bitmap)
+ kref_put(&io_bitmap->ref, svm_io_bitmap_kref_put);
+}
+
static struct kvm_x86_ops svm_x86_ops = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
@@ -1969,6 +2079,8 @@ static struct kvm_x86_ops svm_x86_ops =
.set_tss_addr = svm_set_tss_addr,
.get_tdp_level = get_npt_level,
+ .open_io_ports = svm_open_io_ports,
+ .release_io_bitmaps = svm_release_io_bitmaps,
};
static int __init svm_init(void)
Index: kvm/arch/x86/kvm/vmx.c
===================================================================
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -26,6 +26,7 @@
#include <linux/highmem.h>
#include <linux/sched.h>
#include <linux/moduleparam.h>
+#include <linux/kref.h>
#include <asm/io.h>
#include <asm/desc.h>
@@ -47,6 +48,10 @@ module_param(flexpriority_enabled, bool,
static int enable_ept = 1;
module_param(enable_ept, bool, 0);
+static LIST_HEAD(io_bitmap_pages_a);
+static LIST_HEAD(io_bitmap_pages_b);
+static DEFINE_MUTEX(io_bitmap_mutex);
+
struct vmcs {
u32 revision_id;
u32 abort;
@@ -83,6 +88,8 @@ struct vcpu_vmx {
} irq;
} rmode;
int vpid;
+ struct kvm_io_bitmap_page *io_bitmap_a;
+ struct kvm_io_bitmap_page *io_bitmap_b;
};
static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
@@ -96,8 +103,6 @@ static DEFINE_PER_CPU(struct vmcs *, vmx
static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
static DEFINE_PER_CPU(struct list_head, vcpus_on_cpu);
-static struct page *vmx_io_bitmap_a;
-static struct page *vmx_io_bitmap_b;
static struct page *vmx_msr_bitmap;
static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);
@@ -1875,8 +1880,8 @@ static int vmx_vcpu_setup(struct vcpu_vm
u32 exec_control;
/* I/O */
- vmcs_write64(IO_BITMAP_A, page_to_phys(vmx_io_bitmap_a));
- vmcs_write64(IO_BITMAP_B, page_to_phys(vmx_io_bitmap_b));
+ vmcs_write64(IO_BITMAP_A, page_to_phys(vmx->io_bitmap_a->page));
+ vmcs_write64(IO_BITMAP_B, page_to_phys(vmx->io_bitmap_b->page));
if (cpu_has_vmx_msr_bitmap())
vmcs_write64(MSR_BITMAP, page_to_phys(vmx_msr_bitmap));
@@ -3126,6 +3131,9 @@ static struct kvm_vcpu *vmx_create_vcpu(
vmcs_clear(vmx->vmcs);
+ vmx->io_bitmap_a = kvm->arch.io_bitmap_a;
+ vmx->io_bitmap_b = kvm->arch.io_bitmap_b;
+
cpu = get_cpu();
vmx_vcpu_load(&vmx->vcpu, cpu);
err = vmx_vcpu_setup(vmx);
@@ -3175,6 +3183,131 @@ static int get_ept_level(void)
return VMX_EPT_DEFAULT_GAW + 1;
}
+
+static struct kvm_io_bitmap_page *
+vmx_find_io_bitmap_page(struct list_head *list, struct page *page,
+ void *va_new_pg)
+{
+ struct kvm_io_bitmap_page *bitmap_page;
+
+ list_for_each_entry(bitmap_page, list, io_bitmap_list) {
+ bool found;
+ void *va;
+
+ va = kmap(bitmap_page->page);
+ found = !memcmp(va_new_pg, va, PAGE_SIZE);
+ kunmap(bitmap_page->page);
+ if (found) {
+ kref_get(&bitmap_page->ref);
+ return bitmap_page;
+ }
+ }
+
+ bitmap_page = kmalloc(sizeof(struct kvm_io_bitmap_page), GFP_KERNEL);
+ if (!bitmap_page)
+ return NULL;
+
+ get_page(page);
+ bitmap_page->page = page;
+ kref_init(&bitmap_page->ref);
+ INIT_LIST_HEAD(&bitmap_page->io_bitmap_list);
+ list_add(&bitmap_page->io_bitmap_list, list);
+
+ return bitmap_page;
+}
+
+static void vmx_build_io_bitmap(void *va, struct kvm_ioport_list *ioports,
+ unsigned int start, unsigned int limit)
+{
+ int i;
+
+ memset(va, 0xff, PAGE_SIZE);
+ for (i = 0; i < ioports->nranges; i++) {
+ struct kvm_ioport *ioport = &ioports->ioports[i];
+ __u32 addr, n;
+
+ addr = ioport->addr;
+ for (n = addr; n < addr + ioport->len; n++) {
+ if (n < start)
+ continue;
+ if (n > limit)
+ goto out;
+ clear_bit(n - start, va);
+ }
+ }
+out:
+ return;
+}
+
+static int vmx_open_io_ports(struct kvm *kvm, struct kvm_ioport_list *ioports)
+{
+ struct page *page;
+ struct kvm_io_bitmap_page *bitmap_page_a, *bitmap_page_b;
+ void *va;
+ int ret = -ENOMEM;
+
+ mutex_lock(&io_bitmap_mutex);
+ page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM);
+ if (!page)
+ goto out_unlock;
+
+ va = kmap(page);
+ vmx_build_io_bitmap(va, ioports, 0, 0x7fff);
+ bitmap_page_a = vmx_find_io_bitmap_page(&io_bitmap_pages_a, page, va);
+ kunmap(page);
+ __free_page(page);
+
+ page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM);
+ if (!page)
+ goto out_unlock;
+
+ va = kmap(page);
+ vmx_build_io_bitmap(va, ioports, 0x8000, 0xffff);
+ bitmap_page_b = vmx_find_io_bitmap_page(&io_bitmap_pages_b, page, va);
+ kunmap(page);
+ __free_page(page);
+
+ if (!bitmap_page_a || !bitmap_page_b)
+ goto out_unlock;
+
+ kvm->arch.io_bitmap_a = bitmap_page_a;
+ kvm->arch.io_bitmap_b = bitmap_page_b;
+
+ ret = 0;
+out_unlock:
+ mutex_unlock(&io_bitmap_mutex);
+ return ret;
+
+}
+
+static void vmx_io_bitmap_kref_put(struct kref *kref)
+{
+ struct kvm_io_bitmap_page *page =
+ container_of(kref, struct kvm_io_bitmap_page, ref);
+
+ mutex_lock(&io_bitmap_mutex);
+ list_del(&page->io_bitmap_list);
+ __free_page(page->page);
+ kfree(page);
+ mutex_unlock(&io_bitmap_mutex);
+}
+
+static void vmx_release_io_bitmaps(struct kvm *kvm)
+{
+ struct kvm_io_bitmap_page *io_bitmap_a, *io_bitmap_b;
+
+ io_bitmap_a = kvm->arch.io_bitmap_a;
+ io_bitmap_b = kvm->arch.io_bitmap_b;
+
+ WARN_ON(!kvm->arch.io_bitmap_a);
+ WARN_ON(!kvm->arch.io_bitmap_b);
+
+ if (io_bitmap_a)
+ kref_put(&io_bitmap_a->ref, vmx_io_bitmap_kref_put);
+ if (io_bitmap_b)
+ kref_put(&io_bitmap_b->ref, vmx_io_bitmap_kref_put);
+}
+
static struct kvm_x86_ops vmx_x86_ops = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
@@ -3231,6 +3364,9 @@ static struct kvm_x86_ops vmx_x86_ops =
.set_tss_addr = vmx_set_tss_addr,
.get_tdp_level = get_ept_level,
+
+ .open_io_ports = vmx_open_io_ports,
+ .release_io_bitmaps = vmx_release_io_bitmaps,
};
static int __init vmx_init(void)
@@ -3238,35 +3374,12 @@ static int __init vmx_init(void)
void *va;
int r;
- vmx_io_bitmap_a = alloc_page(GFP_KERNEL | __GFP_HIGHMEM);
- if (!vmx_io_bitmap_a)
- return -ENOMEM;
-
- vmx_io_bitmap_b = alloc_page(GFP_KERNEL | __GFP_HIGHMEM);
- if (!vmx_io_bitmap_b) {
- r = -ENOMEM;
- goto out;
- }
-
vmx_msr_bitmap = alloc_page(GFP_KERNEL | __GFP_HIGHMEM);
if (!vmx_msr_bitmap) {
r = -ENOMEM;
- goto out1;
+ goto out;
}
- /*
- * Allow direct access to the PC debug port (it is often used for I/O
- * delays, but the vmexits simply slow things down).
- */
- va = kmap(vmx_io_bitmap_a);
- memset(va, 0xff, PAGE_SIZE);
- clear_bit(0x80, va);
- kunmap(vmx_io_bitmap_a);
-
- va = kmap(vmx_io_bitmap_b);
- memset(va, 0xff, PAGE_SIZE);
- kunmap(vmx_io_bitmap_b);
-
va = kmap(vmx_msr_bitmap);
memset(va, 0xff, PAGE_SIZE);
kunmap(vmx_msr_bitmap);
@@ -3275,7 +3388,7 @@ static int __init vmx_init(void)
r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx), THIS_MODULE);
if (r)
- goto out2;
+ goto out1;
vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_FS_BASE);
vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_GS_BASE);
@@ -3293,20 +3406,15 @@ static int __init vmx_init(void)
return 0;
-out2:
- __free_page(vmx_msr_bitmap);
out1:
- __free_page(vmx_io_bitmap_b);
+ __free_page(vmx_msr_bitmap);
out:
- __free_page(vmx_io_bitmap_a);
return r;
}
static void __exit vmx_exit(void)
{
__free_page(vmx_msr_bitmap);
- __free_page(vmx_io_bitmap_b);
- __free_page(vmx_io_bitmap_a);
kvm_exit();
}
Index: kvm/arch/x86/kvm/x86.c
===================================================================
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -447,6 +447,8 @@ static u32 emulated_msrs[] = {
MSR_IA32_MISC_ENABLE,
};
+static struct kvm_ioport_list *allowed_open_ioports;
+
static void set_efer(struct kvm_vcpu *vcpu, u64 efer)
{
if (efer & efer_reserved_bits) {
@@ -790,6 +792,7 @@ int kvm_dev_ioctl_check_extension(long e
case KVM_CAP_PIT:
case KVM_CAP_NOP_IO_DELAY:
case KVM_CAP_MP_STATE:
+ case KVM_CAP_OPEN_IOPORT:
r = 1;
break;
case KVM_CAP_VAPIC:
@@ -1494,6 +1497,56 @@ static int kvm_vm_ioctl_set_pit(struct k
return r;
}
+static int kvm_vm_ioctl_set_ioport(struct kvm *kvm, __u32 nranges,
+ struct kvm_ioport_list __user *ioports)
+{
+ struct kvm_ioport_list *ioport_entries;
+ int r, i, nr_ports_ok = 0;
+
+ r = -E2BIG;
+ if (nranges > KVM_MAX_IOPORT_RANGES)
+ goto out;
+ r = -ENOMEM;
+ ioport_entries = vmalloc(nranges * sizeof(struct kvm_ioport) +
+ sizeof(struct kvm_ioport_list));
+ if (!ioport_entries)
+ goto out;
+ r = -EFAULT;
+ if (copy_from_user(ioport_entries, ioports,
+ nranges * sizeof(struct kvm_ioport) +
+ sizeof(struct kvm_ioport_list)))
+ goto out_free;
+ r = -EPERM;
+ if (ioport_entries->nranges != nranges)
+ goto out_free;
+
+ for (i = 0; i < ioport_entries->nranges; i++) {
+ int n;
+ struct kvm_ioport *user_ioport = &ioport_entries->ioports[i];
+
+ for (n = 0; n < allowed_open_ioports->nranges; n++) {
+ struct kvm_ioport *allowed_ioport;
+ allowed_ioport = &allowed_open_ioports->ioports[n];
+ if (user_ioport->addr != allowed_ioport->addr ||
+ user_ioport->len > allowed_ioport->len)
+ continue;
+ else {
+ nr_ports_ok++;
+ break;
+ }
+ }
+ }
+ if (nr_ports_ok != ioport_entries->nranges)
+ goto out_free;
+
+ r = kvm_x86_ops->open_io_ports(kvm, ioport_entries);
+
+out_free:
+ vfree(ioport_entries);
+out:
+ return r;
+}
+
/*
* Get (and clear) the dirty memory log for a memory slot.
*/
@@ -1678,6 +1731,16 @@ long kvm_arch_vm_ioctl(struct file *filp
r = 0;
break;
}
+ case KVM_SET_OPEN_IOPORT: {
+ struct kvm_ioport_list ioport_list;
+ r = -EFAULT;
+ if (copy_from_user(&ioport_list, argp, sizeof ioport_list))
+ goto out;
+ r = kvm_vm_ioctl_set_ioport(kvm, ioport_list.nranges, argp);
+ if (r)
+ goto out;
+ break;
+ }
default:
;
}
@@ -1700,6 +1763,25 @@ static void kvm_init_msr_list(void)
num_msrs_to_save = j;
}
+static int kvm_init_ioport_list(void)
+{
+ allowed_open_ioports = kmalloc(sizeof(struct kvm_ioport) +
+ sizeof(struct kvm_ioport_list), GFP_KERNEL);
+ if (!allowed_open_ioports)
+ return -ENOMEM;
+ allowed_open_ioports->nranges = 1;
+ allowed_open_ioports->ioports[0].addr = 0x80;
+ allowed_open_ioports->ioports[0].len = 1;
+
+ return 0;
+}
+
+static void kvm_exit_ioport_list(void)
+{
+ kfree(allowed_open_ioports);
+ allowed_open_ioports = NULL;
+}
+
/*
* Only apic need an MMIO device hook, so shortcut now..
*/
@@ -2384,6 +2466,9 @@ int kvm_arch_init(void *opaque)
r = kvm_mmu_module_init();
if (r)
goto out;
+ r = kvm_init_ioport_list();
+ if (r)
+ goto out_exit_mmu;
kvm_init_msr_list();
@@ -2394,6 +2479,8 @@ int kvm_arch_init(void *opaque)
PT_DIRTY_MASK, PT64_NX_MASK, 0);
return 0;
+out_exit_mmu:
+ kvm_mmu_module_exit();
out:
return r;
}
@@ -2402,6 +2489,7 @@ void kvm_arch_exit(void)
{
kvm_x86_ops = NULL;
kvm_mmu_module_exit();
+ kvm_exit_ioport_list();
}
int kvm_emulate_halt(struct kvm_vcpu *vcpu)
@@ -3859,14 +3947,43 @@ void kvm_arch_vcpu_uninit(struct kvm_vcp
free_page((unsigned long)vcpu->arch.pio_data);
}
+static int kvm_open_def_ioports(struct kvm *kvm)
+{
+ struct kvm_ioport_list *ioports;
+ int ret;
+
+ ioports = kzalloc(sizeof(struct kvm_ioport) +
+ sizeof(struct kvm_ioport_list), GFP_KERNEL);
+ if (!ioports)
+ return -ENOMEM;
+
+ /*
+ * Allow direct access to the PC debug port (it is often used for I/O
+ * delays, but the vmexits simply slow things down).
+ */
+ ioports->nranges = 1;
+ ioports->ioports[0].addr = 0x80;
+ ioports->ioports[0].len = 1;
+
+ ret = kvm_x86_ops->open_io_ports(kvm, ioports);
+ kfree(ioports);
+ return ret;
+}
+
struct kvm *kvm_arch_create_vm(void)
{
+ int ret;
struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
if (!kvm)
return ERR_PTR(-ENOMEM);
INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
+ ret = kvm_open_def_ioports(kvm);
+ if (ret) {
+ kfree(kvm);
+ kvm = ERR_PTR(ret);
+ }
return kvm;
}
@@ -3908,6 +4025,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm
put_page(kvm->arch.apic_access_page);
if (kvm->arch.ept_identity_pagetable)
put_page(kvm->arch.ept_identity_pagetable);
+ kvm_x86_ops->release_io_bitmaps(kvm);
kfree(kvm);
}
Index: kvm/include/asm-x86/kvm.h
===================================================================
--- kvm.orig/include/asm-x86/kvm.h
+++ kvm/include/asm-x86/kvm.h
@@ -209,6 +209,17 @@ struct kvm_pit_state {
struct kvm_pit_channel_state channels[3];
};
+/* for KVM_SET_OPEN_IOPORT */
+struct kvm_ioport {
+ __u32 addr;
+ __u32 len;
+};
+
+struct kvm_ioport_list {
+ __u32 nranges;
+ struct kvm_ioport ioports[0];
+};
+
#define KVM_TRC_INJ_VIRQ (KVM_TRC_HANDLER + 0x02)
#define KVM_TRC_REDELIVER_EVT (KVM_TRC_HANDLER + 0x03)
#define KVM_TRC_PEND_INTR (KVM_TRC_HANDLER + 0x04)
Index: kvm/include/asm-x86/kvm_host.h
===================================================================
--- kvm.orig/include/asm-x86/kvm_host.h
+++ kvm/include/asm-x86/kvm_host.h
@@ -78,6 +78,7 @@
#define KVM_MIN_FREE_MMU_PAGES 5
#define KVM_REFILL_PAGES 25
#define KVM_MAX_CPUID_ENTRIES 40
+#define KVM_MAX_IOPORT_RANGES 100
extern spinlock_t kvm_lock;
extern struct list_head vm_list;
@@ -296,6 +297,12 @@ struct kvm_mem_alias {
gfn_t target_gfn;
};
+struct kvm_io_bitmap_page {
+ struct page *page;
+ struct list_head io_bitmap_list;
+ struct kref ref;
+};
+
struct kvm_arch{
int naliases;
struct kvm_mem_alias aliases[KVM_ALIAS_SLOTS];
@@ -320,6 +327,9 @@ struct kvm_arch{
struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
+
+ struct kvm_io_bitmap_page *io_bitmap_a;
+ struct kvm_io_bitmap_page *io_bitmap_b;
};
struct kvm_vm_stat {
@@ -429,6 +439,9 @@ struct kvm_x86_ops {
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
+
+ int (*open_io_ports)(struct kvm *kvm, struct kvm_ioport_list *ioports);
+ void (*release_io_bitmaps)(struct kvm *kvm);
};
extern struct kvm_x86_ops *kvm_x86_ops;
Index: kvm/include/linux/kvm.h
===================================================================
--- kvm.orig/include/linux/kvm.h
+++ kvm/include/linux/kvm.h
@@ -346,6 +346,7 @@ struct kvm_trace_rec {
#define KVM_CAP_NOP_IO_DELAY 12
#define KVM_CAP_PV_MMU 13
#define KVM_CAP_MP_STATE 14
+#define KVM_CAP_OPEN_IOPORT 15
/*
* ioctls for VM fds
@@ -371,6 +372,7 @@ struct kvm_trace_rec {
#define KVM_CREATE_PIT _IO(KVMIO, 0x64)
#define KVM_GET_PIT _IOWR(KVMIO, 0x65, struct kvm_pit_state)
#define KVM_SET_PIT _IOR(KVMIO, 0x66, struct kvm_pit_state)
+#define KVM_SET_OPEN_IOPORT _IOR(KVMIO, 0x67, struct kvm_ioport_list)
/*
* ioctls for vcpu fds
--
^ permalink raw reply [flat|nested] 27+ messages in thread* [patch 03/12] KVM: allow userspace to open access to ACPI pmtimer
2008-05-29 22:22 [patch 00/12] fake ACPI C2 emulation v2 Marcelo Tosatti
2008-05-29 22:22 ` [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer) Marcelo Tosatti
2008-05-29 22:22 ` [patch 02/12] KVM: allow multiple IO bitmap pages, provide userspace interface Marcelo Tosatti
@ 2008-05-29 22:22 ` Marcelo Tosatti
2008-05-29 22:22 ` [patch 04/12] KVM: move muldiv64 to x86.c, export Marcelo Tosatti
` (9 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2008-05-29 22:22 UTC (permalink / raw)
To: Avi Kivity
Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm,
Marcelo Tosatti
[-- Attachment #1: kvm-open-pmtimer-real --]
[-- Type: text/plain, Size: 1041 bytes --]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm/arch/x86/kvm/x86.c
===================================================================
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -27,6 +27,7 @@
#include <linux/module.h>
#include <linux/mman.h>
#include <linux/highmem.h>
+#include <linux/acpi_pmtmr.h>
#include <asm/uaccess.h>
#include <asm/msr.h>
@@ -1765,13 +1766,18 @@ static void kvm_init_msr_list(void)
static int kvm_init_ioport_list(void)
{
- allowed_open_ioports = kmalloc(sizeof(struct kvm_ioport) +
+ allowed_open_ioports = kmalloc(sizeof(struct kvm_ioport) * 2 +
sizeof(struct kvm_ioport_list), GFP_KERNEL);
if (!allowed_open_ioports)
return -ENOMEM;
allowed_open_ioports->nranges = 1;
allowed_open_ioports->ioports[0].addr = 0x80;
allowed_open_ioports->ioports[0].len = 1;
+ if (pmtmr_ioport) {
+ allowed_open_ioports->nranges++;
+ allowed_open_ioports->ioports[1].addr = pmtmr_ioport;
+ allowed_open_ioports->ioports[1].len = 4;
+ }
return 0;
}
--
^ permalink raw reply [flat|nested] 27+ messages in thread* [patch 04/12] KVM: move muldiv64 to x86.c, export
2008-05-29 22:22 [patch 00/12] fake ACPI C2 emulation v2 Marcelo Tosatti
` (2 preceding siblings ...)
2008-05-29 22:22 ` [patch 03/12] KVM: allow userspace to open access to ACPI pmtimer Marcelo Tosatti
@ 2008-05-29 22:22 ` Marcelo Tosatti
2008-05-29 22:22 ` [patch 05/12] KVM: in-kernel ACPI timer emulation Marcelo Tosatti
` (8 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2008-05-29 22:22 UTC (permalink / raw)
To: Avi Kivity
Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm,
Marcelo Tosatti
[-- Attachment #1: muldiv64 --]
[-- Type: text/plain, Size: 2222 bytes --]
This should probably to go lib/
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm/arch/x86/kvm/i8254.c
===================================================================
--- kvm.orig/arch/x86/kvm/i8254.c
+++ kvm/arch/x86/kvm/i8254.c
@@ -45,26 +45,6 @@
#define RW_STATE_WORD0 3
#define RW_STATE_WORD1 4
-/* Compute with 96 bit intermediate result: (a*b)/c */
-static u64 muldiv64(u64 a, u32 b, u32 c)
-{
- union {
- u64 ll;
- struct {
- u32 low, high;
- } l;
- } u, res;
- u64 rl, rh;
-
- u.ll = a;
- rl = (u64)u.l.low * (u64)b;
- rh = (u64)u.l.high * (u64)b;
- rh += (rl >> 32);
- res.l.high = div64_u64(rh, c);
- res.l.low = div64_u64(((mod_64(rh, c) << 32) + (rl & 0xffffffff)), c);
- return res.ll;
-}
-
static void pit_set_gate(struct kvm *kvm, int channel, u32 val)
{
struct kvm_kpit_channel_state *c =
Index: kvm/arch/x86/kvm/x86.c
===================================================================
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -4116,3 +4116,30 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu
smp_call_function_single(ipi_pcpu, vcpu_kick_intr, vcpu, 0, 0);
put_cpu();
}
+
+#ifndef CONFIG_X86_64
+#define mod_64(x, y) ((x) - (y) * div64_u64(x, y))
+#else
+#define mod_64(x, y) ((x) % (y))
+#endif
+
+/* Compute with 96 bit intermediate result: (a*b)/c */
+u64 muldiv64(u64 a, u32 b, u32 c)
+{
+ union {
+ u64 ll;
+ struct {
+ u32 low, high;
+ } l;
+ } u, res;
+ u64 rl, rh;
+
+ u.ll = a;
+ rl = (u64)u.l.low * (u64)b;
+ rh = (u64)u.l.high * (u64)b;
+ rh += (rl >> 32);
+ res.l.high = div64_u64(rh, c);
+ res.l.low = div64_u64(((mod_64(rh, c) << 32) + (rl & 0xffffffff)), c);
+ return res.ll;
+}
+EXPORT_SYMBOL(muldiv64);
Index: kvm/include/asm-x86/kvm_host.h
===================================================================
--- kvm.orig/include/asm-x86/kvm_host.h
+++ kvm/include/asm-x86/kvm_host.h
@@ -660,6 +660,8 @@ static inline void kvm_inject_gp(struct
kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
}
+u64 muldiv64(u64, u32, u32);
+
#define ASM_VMX_VMCLEAR_RAX ".byte 0x66, 0x0f, 0xc7, 0x30"
#define ASM_VMX_VMLAUNCH ".byte 0x0f, 0x01, 0xc2"
#define ASM_VMX_VMRESUME ".byte 0x0f, 0x01, 0xc3"
--
^ permalink raw reply [flat|nested] 27+ messages in thread* [patch 05/12] KVM: in-kernel ACPI timer emulation
2008-05-29 22:22 [patch 00/12] fake ACPI C2 emulation v2 Marcelo Tosatti
` (3 preceding siblings ...)
2008-05-29 22:22 ` [patch 04/12] KVM: move muldiv64 to x86.c, export Marcelo Tosatti
@ 2008-05-29 22:22 ` Marcelo Tosatti
2008-05-29 22:22 ` [patch 06/12] QEMU/KVM: self-disabling C2 emulation Marcelo Tosatti
` (7 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2008-05-29 22:22 UTC (permalink / raw)
To: Avi Kivity
Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm,
Marcelo Tosatti
[-- Attachment #1: acpi-tmr-inkernel --]
[-- Type: text/plain, Size: 6863 bytes --]
C1 emulation reads the pmtimer very often, so move ACPI pmtimer to
kernel space.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm/arch/x86/kvm/Makefile
===================================================================
--- kvm.orig/arch/x86/kvm/Makefile
+++ kvm/arch/x86/kvm/Makefile
@@ -10,7 +10,7 @@ endif
EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \
- i8254.o
+ i8254.o acpi.o
obj-$(CONFIG_KVM) += kvm.o
kvm-intel-objs = vmx.o
obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
Index: kvm/arch/x86/kvm/acpi.c
===================================================================
--- /dev/null
+++ kvm/arch/x86/kvm/acpi.c
@@ -0,0 +1,103 @@
+#include <linux/kvm_host.h>
+#include <linux/kvm.h>
+#include <linux/acpi_pmtmr.h>
+#include "iodev.h"
+#include "irq.h"
+#include "acpi.h"
+
+static void get_pmtmr(int pmtmr_offset, void *data)
+{
+ u32 d;
+ d = muldiv64(ktime_to_ns(ktime_get()), PMTMR_TICKS_PER_SEC, NSEC_PER_SEC);
+ d &= ACPI_PM_MASK;
+ d += pmtmr_offset;
+ d &= ACPI_PM_MASK;
+ memcpy(data, &d, sizeof d);
+}
+
+static void acpi_ioport_read(struct kvm_io_device *this, gpa_t addr, int len,
+ void *data)
+{
+ struct kvm_acpi_timer *acpi = (struct kvm_acpi_timer *)this->private;
+
+ if (len == 4)
+ get_pmtmr(acpi->pmtmr_offset, data);
+
+ return;
+}
+
+static void acpi_ioport_write(struct kvm_io_device *this, gpa_t addr, int len,
+ const void *data)
+{
+ return;
+}
+
+static int acpi_in_range(struct kvm_io_device *this, gpa_t addr)
+{
+ struct kvm_acpi_timer *acpi = (struct kvm_acpi_timer *)this->private;
+ struct kvm *kvm = acpi->kvm;
+
+ if (!kvm->arch.vacpi_timer)
+ return 0;
+
+ return (addr == kvm->arch.vacpi_timer->pmtmr_state.base_address);
+}
+
+void kvm_acpi_free(struct kvm_io_device *this)
+{
+ struct kvm_acpi_timer *acpi = (struct kvm_acpi_timer *)this->private;
+
+ kfree(acpi);
+}
+
+int kvm_vm_ioctl_get_acpi_timer(struct kvm *kvm,
+ struct kvm_acpi_timer_state *pmtmr)
+{
+ if (!kvm->arch.vacpi_timer)
+ return -EINVAL;
+
+ get_pmtmr(kvm->arch.vacpi_timer->pmtmr_offset, &pmtmr->timer_val);
+ pmtmr->base_address = kvm->arch.vacpi_timer->pmtmr_state.base_address;
+ return 0;
+}
+
+int kvm_vm_ioctl_set_acpi_timer(struct kvm *kvm,
+ struct kvm_acpi_timer_state *pmtmr)
+{
+ u32 d;
+
+ if (!kvm->arch.vacpi_timer)
+ return -EINVAL;
+ kvm->arch.vacpi_timer->pmtmr_state.base_address = pmtmr->base_address;
+
+ d = muldiv64(ktime_to_ns(ktime_get()), PMTMR_TICKS_PER_SEC, NSEC_PER_SEC);
+ d &= ACPI_PM_MASK;
+ kvm->arch.vacpi_timer->pmtmr_offset = (pmtmr->timer_val - d) & ACPI_PM_MASK;
+ return 0;
+}
+
+/*
+ * Note: matches BIOS (currently hardcoded) definition.
+ */
+#define ACPI_PMTMR_BASE 0xb008
+
+int kvm_acpi_init(struct kvm *kvm)
+{
+ struct kvm_acpi_timer *acpi;
+
+ acpi = kzalloc(sizeof(struct kvm_acpi_timer), GFP_KERNEL);
+ if (!acpi)
+ return -ENOMEM;
+
+ acpi->dev.read = acpi_ioport_read;
+ acpi->dev.write = acpi_ioport_write;
+ acpi->dev.in_range = acpi_in_range;
+ acpi->dev.private = acpi;
+ acpi->kvm = kvm;
+ acpi->dev.destructor = kvm_acpi_free;
+ acpi->pmtmr_state.base_address = ACPI_PMTMR_BASE;
+ kvm->arch.vacpi_timer = acpi;
+ kvm_io_bus_register_dev(&kvm->pio_bus, &acpi->dev);
+
+ return 0;
+}
Index: kvm/arch/x86/kvm/acpi.h
===================================================================
--- /dev/null
+++ kvm/arch/x86/kvm/acpi.h
@@ -0,0 +1,19 @@
+#ifndef __KVM_ACPI_H
+#define __KVM_ACPI_H
+
+struct kvm_acpi_timer {
+ struct kvm_io_device dev;
+ struct kvm *kvm;
+ int pmtmr_offset;
+ struct kvm_acpi_timer_state pmtmr_state;
+};
+
+int kvm_acpi_init(struct kvm *kvm);
+
+int kvm_vm_ioctl_get_acpi_timer(struct kvm *kvm,
+ struct kvm_acpi_timer_state *pmtmr);
+
+int kvm_vm_ioctl_set_acpi_timer(struct kvm *kvm,
+ struct kvm_acpi_timer_state *pmtmr);
+
+#endif
Index: kvm/arch/x86/kvm/x86.c
===================================================================
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -19,6 +19,7 @@
#include "mmu.h"
#include "i8254.h"
#include "tss.h"
+#include "acpi.h"
#include <linux/clocksource.h>
#include <linux/kvm.h>
@@ -794,6 +795,7 @@ int kvm_dev_ioctl_check_extension(long e
case KVM_CAP_NOP_IO_DELAY:
case KVM_CAP_MP_STATE:
case KVM_CAP_OPEN_IOPORT:
+ case KVM_CAP_ACPI_TIMER:
r = 1;
break;
case KVM_CAP_VAPIC:
@@ -1742,6 +1744,31 @@ long kvm_arch_vm_ioctl(struct file *filp
goto out;
break;
}
+ case KVM_CREATE_ACPI_TIMER: {
+ r = kvm_acpi_init(kvm);
+ break;
+ }
+ case KVM_GET_ACPI_TIMER: {
+ struct kvm_acpi_timer_state acpi_timer;
+ r = kvm_vm_ioctl_get_acpi_timer(kvm, &acpi_timer);
+ if (r)
+ goto out;
+ r = -EFAULT;
+ if (copy_to_user(argp, &acpi_timer, sizeof acpi_timer))
+ goto out;
+ r = 0;
+ break;
+ }
+ case KVM_SET_ACPI_TIMER: {
+ struct kvm_acpi_timer_state acpi_timer;
+ r = -EFAULT;
+ if (copy_from_user(&acpi_timer, argp, sizeof acpi_timer))
+ goto out;
+ r = kvm_vm_ioctl_set_acpi_timer(kvm, &acpi_timer);
+ if (r)
+ goto out;
+ break;
+ }
default:
;
}
Index: kvm/include/linux/kvm.h
===================================================================
--- kvm.orig/include/linux/kvm.h
+++ kvm/include/linux/kvm.h
@@ -347,6 +347,7 @@ struct kvm_trace_rec {
#define KVM_CAP_PV_MMU 13
#define KVM_CAP_MP_STATE 14
#define KVM_CAP_OPEN_IOPORT 15
+#define KVM_CAP_ACPI_TIMER 16
/*
* ioctls for VM fds
@@ -373,6 +374,9 @@ struct kvm_trace_rec {
#define KVM_GET_PIT _IOWR(KVMIO, 0x65, struct kvm_pit_state)
#define KVM_SET_PIT _IOR(KVMIO, 0x66, struct kvm_pit_state)
#define KVM_SET_OPEN_IOPORT _IOR(KVMIO, 0x67, struct kvm_ioport_list)
+#define KVM_CREATE_ACPI_TIMER _IO(KVMIO, 0x68)
+#define KVM_GET_ACPI_TIMER _IOWR(KVMIO, 0x69, struct kvm_acpi_timer_state)
+#define KVM_SET_ACPI_TIMER _IOR(KVMIO, 0x70, struct kvm_acpi_timer_state)
/*
* ioctls for vcpu fds
Index: kvm/include/asm-x86/kvm.h
===================================================================
--- kvm.orig/include/asm-x86/kvm.h
+++ kvm/include/asm-x86/kvm.h
@@ -220,6 +220,12 @@ struct kvm_ioport_list {
struct kvm_ioport ioports[0];
};
+/* for KVM_GET_ACPI_TIMER and KVM_SET_ACPI_TIMER */
+struct kvm_acpi_timer_state {
+ __u32 base_address;
+ __u32 timer_val;
+};
+
#define KVM_TRC_INJ_VIRQ (KVM_TRC_HANDLER + 0x02)
#define KVM_TRC_REDELIVER_EVT (KVM_TRC_HANDLER + 0x03)
#define KVM_TRC_PEND_INTR (KVM_TRC_HANDLER + 0x04)
Index: kvm/include/asm-x86/kvm_host.h
===================================================================
--- kvm.orig/include/asm-x86/kvm_host.h
+++ kvm/include/asm-x86/kvm_host.h
@@ -318,6 +318,7 @@ struct kvm_arch{
struct kvm_pic *vpic;
struct kvm_ioapic *vioapic;
struct kvm_pit *vpit;
+ struct kvm_acpi_timer *vacpi_timer;
int round_robin_prev_vcpu;
unsigned int tss_addr;
--
^ permalink raw reply [flat|nested] 27+ messages in thread* [patch 06/12] QEMU/KVM: self-disabling C2 emulation
2008-05-29 22:22 [patch 00/12] fake ACPI C2 emulation v2 Marcelo Tosatti
` (4 preceding siblings ...)
2008-05-29 22:22 ` [patch 05/12] KVM: in-kernel ACPI timer emulation Marcelo Tosatti
@ 2008-05-29 22:22 ` Marcelo Tosatti
2008-05-29 22:22 ` [patch 07/12] libkvm: interface to KVM_SET_OPEN_IOPORT Marcelo Tosatti
` (6 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2008-05-29 22:22 UTC (permalink / raw)
To: Avi Kivity; +Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm
[-- Attachment #1: acpi-c2-fake --]
[-- Type: text/plain, Size: 13872 bytes --]
Inform C2 state support via ACPI's CST per-processor package's, but
write an invalid latency value the first time the guest attempts to idle
via P_LVL2 port.
This way the TSC is considered unreliable, and we get away with the
costs relative to APIC timer broadcasts on enter/exit necessary for C1+.
It would be nice to fallback to plain hlt idle instead of C1, which
does not use the pmtimer for idle measurement, but Linux guests with
CONFIG_CPUIDLE enabled fallback to poll_idle instead which is very
inefficient.
Index: kvm-userspace.realtip/bios/acpi-dsdt.dsl
===================================================================
--- kvm-userspace.realtip.orig/bios/acpi-dsdt.dsl
+++ kvm-userspace.realtip/bios/acpi-dsdt.dsl
@@ -33,8 +33,20 @@ DefinitionBlock (
PRU, 8,
PRD, 8,
}
+ OperationRegion(PWNO, SystemIO, 0xb040, 0x02)
+ Field (PWNO, WordAcc, NoLock, WriteAsZeros)
+ {
+ PWC, 16,
+ }
- Processor (CPU0, 0x00, 0x0000b010, 0x06) {Method (_STA) { Return(0xF)}}
+ Processor (CPU0, 0x00, 0x0000b010, 0x06) {
+ Method (_STA) { Return(0xF)}
+ Name(_CST, Package() {
+ 1, Package() {
+ ResourceTemplate() {Register(SystemIO, 8, 0, 0xb014)},
+ 2, 2, 300},
+ })
+ }
Processor (CPU1, 0x01, 0x0000b010, 0x06) {
Name (TMP, Buffer(0x8) {0x0, 0x8, 0x01, 0x01, 0x1, 0x0, 0x0, 0x0})
Method(_MAT, 0) {
@@ -44,6 +56,11 @@ DefinitionBlock (
Method (_STA) {
Return(0xF)
}
+ Name(_CST, Package() {
+ 1, Package() {
+ ResourceTemplate() {Register(SystemIO, 8, 0, 0xb014)},
+ 2, 2, 300},
+ })
}
Processor (CPU2, 0x02, 0x0000b010, 0x06) {
Name (TMP, Buffer(0x8) {0x0, 0x8, 0x02, 0x02, 0x1, 0x0, 0x0, 0x0})
@@ -54,6 +71,11 @@ DefinitionBlock (
Method (_STA) {
Return(0xF)
}
+ Name(_CST, Package() {
+ 1, Package() {
+ ResourceTemplate() {Register(SystemIO, 8, 0, 0xb014)},
+ 2, 2, 300},
+ })
}
Processor (CPU3, 0x03, 0x0000b010, 0x06) {
Name (TMP, Buffer(0x8) {0x0, 0x8, 0x03, 0x03, 0x1, 0x0, 0x0, 0x0})
@@ -64,6 +86,11 @@ DefinitionBlock (
Method (_STA) {
Return(0xF)
}
+ Name(_CST, Package() {
+ 1, Package() {
+ ResourceTemplate() {Register(SystemIO, 8, 0, 0xb014)},
+ 2, 2, 300},
+ })
}
Processor (CPU4, 0x04, 0x0000b010, 0x06) {
Name (TMP, Buffer(0x8) {0x0, 0x8, 0x04, 0x04, 0x1, 0x0, 0x0, 0x0})
@@ -74,6 +101,11 @@ DefinitionBlock (
Method (_STA) {
Return(0xF)
}
+ Name(_CST, Package() {
+ 1, Package() {
+ ResourceTemplate() {Register(SystemIO, 8, 0, 0xb014)},
+ 2, 2, 300},
+ })
}
Processor (CPU5, 0x05, 0x0000b010, 0x06) {
Name (TMP, Buffer(0x8) {0x0, 0x8, 0x05, 0x05, 0x1, 0x0, 0x0, 0x0})
@@ -84,6 +116,11 @@ DefinitionBlock (
Method (_STA) {
Return(0xF)
}
+ Name(_CST, Package() {
+ 1, Package() {
+ ResourceTemplate() {Register(SystemIO, 8, 0, 0xb014)},
+ 2, 2, 300},
+ })
}
Processor (CPU6, 0x06, 0x0000b010, 0x06) {
Name (TMP, Buffer(0x8) {0x0, 0x8, 0x06, 0x06, 0x1, 0x0, 0x0, 0x0})
@@ -94,6 +131,11 @@ DefinitionBlock (
Method (_STA) {
Return(0xF)
}
+ Name(_CST, Package() {
+ 1, Package() {
+ ResourceTemplate() {Register(SystemIO, 8, 0, 0xb014)},
+ 2, 2, 300},
+ })
}
Processor (CPU7, 0x07, 0x0000b010, 0x06) {
Name (TMP, Buffer(0x8) {0x0, 0x8, 0x07, 0x07, 0x1, 0x0, 0x0, 0x0})
@@ -104,6 +146,11 @@ DefinitionBlock (
Method (_STA) {
Return(0xF)
}
+ Name(_CST, Package() {
+ 1, Package() {
+ ResourceTemplate() {Register(SystemIO, 8, 0, 0xb014)},
+ 2, 2, 300},
+ })
}
Processor (CPU8, 0x08, 0x0000b010, 0x06) {
Name (TMP, Buffer(0x8) {0x0, 0x8, 0x08, 0x08, 0x1, 0x0, 0x0, 0x0})
@@ -114,6 +161,11 @@ DefinitionBlock (
Method (_STA) {
Return(0xF)
}
+ Name(_CST, Package() {
+ 1, Package() {
+ ResourceTemplate() {Register(SystemIO, 8, 0, 0xb014)},
+ 2, 2, 300},
+ })
}
Processor (CPU9, 0x09, 0x0000b010, 0x06) {
Name (TMP, Buffer(0x8) {0x0, 0x8, 0x09, 0x09, 0x1, 0x0, 0x0, 0x0})
@@ -124,6 +176,11 @@ DefinitionBlock (
Method (_STA) {
Return(0xF)
}
+ Name(_CST, Package() {
+ 1, Package() {
+ ResourceTemplate() {Register(SystemIO, 8, 0, 0xb014)},
+ 2, 2, 300},
+ })
}
Processor (CPUA, 0x0a, 0x0000b010, 0x06) {
Name (TMP, Buffer(0x8) {0x0, 0x8, 0x0A, 0x0A, 0x1, 0x0, 0x0, 0x0})
@@ -134,6 +191,11 @@ DefinitionBlock (
Method (_STA) {
Return(0xF)
}
+ Name(_CST, Package() {
+ 1, Package() {
+ ResourceTemplate() {Register(SystemIO, 8, 0, 0xb014)},
+ 2, 2, 300},
+ })
}
Processor (CPUB, 0x0b, 0x0000b010, 0x06) {
Name (TMP, Buffer(0x8) {0x0, 0x8, 0x0B, 0x0B, 0x1, 0x0, 0x0, 0x0})
@@ -144,6 +206,11 @@ DefinitionBlock (
Method (_STA) {
Return(0xF)
}
+ Name(_CST, Package() {
+ 1, Package() {
+ ResourceTemplate() {Register(SystemIO, 8, 0, 0xb014)},
+ 2, 2, 300},
+ })
}
Processor (CPUC, 0x0c, 0x0000b010, 0x06) {
Name (TMP, Buffer(0x8) {0x0, 0x8, 0x0C, 0x0C, 0x1, 0x0, 0x0, 0x0})
@@ -154,6 +221,11 @@ DefinitionBlock (
Method (_STA) {
Return(0xF)
}
+ Name(_CST, Package() {
+ 1, Package() {
+ ResourceTemplate() {Register(SystemIO, 8, 0, 0xb014)},
+ 2, 2, 300},
+ })
}
Processor (CPUD, 0x0d, 0x0000b010, 0x06) {
Name (TMP, Buffer(0x8) {0x0, 0x8, 0x0D, 0x0D, 0x1, 0x0, 0x0, 0x0})
@@ -164,6 +236,11 @@ DefinitionBlock (
Method (_STA) {
Return(0xF)
}
+ Name(_CST, Package() {
+ 1, Package() {
+ ResourceTemplate() {Register(SystemIO, 8, 0, 0xb014)},
+ 2, 2, 300},
+ })
}
Processor (CPUE, 0x0e, 0x0000b010, 0x06) {
Name (TMP, Buffer(0x8) {0x0, 0x8, 0x0E, 0x0E, 0x1, 0x0, 0x0, 0x0})
@@ -174,6 +251,11 @@ DefinitionBlock (
Method (_STA) {
Return(0xF)
}
+ Name(_CST, Package() {
+ 1, Package() {
+ ResourceTemplate() {Register(SystemIO, 8, 0, 0xb014)},
+ 2, 2, 300},
+ })
}
}
@@ -1544,6 +1626,81 @@ DefinitionBlock (
Return(0x01)
}
Method(_L06) {
+ If (And(\_PR.PWC, 0x1)) {
+ Store (0xfffff, Index (DeRefOf (Index (\_PR.CPU0._CST, 1)), 2))
+ Notify(\_PR.CPU0, 0x81)
+ }
+
+ If (And(\_PR.PWC, 0x2)) {
+ Store (0xfffff, Index (DeRefOf (Index (\_PR.CPU1._CST, 1)), 2))
+ Notify(\_PR.CPU1, 0x81)
+ }
+
+ If (And(\_PR.PWC, 0x4)) {
+ Store (0xfffff, Index (DeRefOf (Index (\_PR.CPU2._CST, 1)), 2))
+ Notify(\_PR.CPU2, 0x81)
+ }
+
+ If (And(\_PR.PWC, 0x8)) {
+ Store (0xfffff, Index (DeRefOf (Index (\_PR.CPU3._CST, 1)), 2))
+ Notify(\_PR.CPU3, 0x81)
+ }
+
+ If (And(\_PR.PWC, 0x10)) {
+ Store (0xfffff, Index (DeRefOf (Index (\_PR.CPU4._CST, 1)), 2))
+ Notify(\_PR.CPU4, 0x81)
+ }
+
+ If (And(\_PR.PWC, 0x20)) {
+ Store (0xfffff, Index (DeRefOf (Index (\_PR.CPU5._CST, 1)), 2))
+ Notify(\_PR.CPU5, 0x81)
+ }
+
+ If (And(\_PR.PWC, 0x40)) {
+ Store (0xfffff, Index (DeRefOf (Index (\_PR.CPU6._CST, 1)), 2))
+ Notify(\_PR.CPU6, 0x81)
+ }
+
+ If (And(\_PR.PWC, 0x80)) {
+ Store (0xfffff, Index (DeRefOf (Index (\_PR.CPU7._CST, 1)), 2))
+ Notify(\_PR.CPU7, 0x81)
+ }
+
+ If (And(\_PR.PWC, 0x100)) {
+ Store (0xfffff, Index (DeRefOf (Index (\_PR.CPU8._CST, 1)), 2))
+ Notify(\_PR.CPU8, 0x81)
+ }
+
+ If (And(\_PR.PWC, 0x200)) {
+ Store (0xfffff, Index (DeRefOf (Index (\_PR.CPU9._CST, 1)), 2))
+ Notify(\_PR.CPU9, 0x81)
+ }
+
+ If (And(\_PR.PWC, 0x400)) {
+ Store (0xfffff, Index (DeRefOf (Index (\_PR.CPUA._CST, 1)), 2))
+ Notify(\_PR.CPUA, 0x81)
+ }
+
+ If (And(\_PR.PWC, 0x800)) {
+ Store (0xfffff, Index (DeRefOf (Index (\_PR.CPUB._CST, 1)), 2))
+ Notify(\_PR.CPUB, 0x81)
+ }
+
+ If (And(\_PR.PWC, 0x1000)) {
+ Store (0xfffff, Index (DeRefOf (Index (\_PR.CPUC._CST, 1)), 2))
+ Notify(\_PR.CPUC, 0x81)
+ }
+
+ If (And(\_PR.PWC, 0x2000)) {
+ Store (0xfffff, Index (DeRefOf (Index (\_PR.CPUD._CST, 1)), 2))
+ Notify(\_PR.CPUD, 0x81)
+ }
+
+ If (And(\_PR.PWC, 0x4000)) {
+ Store (0xfffff, Index (DeRefOf (Index (\_PR.CPUE._CST, 1)), 2))
+ Notify(\_PR.CPUE, 0x81)
+ }
+
Return(0x01)
}
Method(_L07) {
Index: kvm-userspace.realtip/qemu/hw/acpi.c
===================================================================
--- kvm-userspace.realtip.orig/qemu/hw/acpi.c
+++ kvm-userspace.realtip/qemu/hw/acpi.c
@@ -121,6 +121,31 @@ static void pm_tmr_timer(void *opaque)
pm_update_sci(s);
}
+/*
+ * Fake C2 emulation, so the OS will consider the TSC unreliable
+ * an fallback to C1 after the latency is updated to a high value
+ * in acpi-dsdt.dsl.
+ */
+static void qemu_system_cpu_power_notify(int cpu);
+static uint32_t pm_ioport_readb(void *opaque, uint32_t addr)
+{
+ CPUState *env = cpu_single_env;
+
+ addr &= 0x3f;
+ switch (addr) {
+ case 0x14: /* P_LVL2 */
+ qemu_system_cpu_power_notify(env->cpu_index);
+ }
+#ifdef DEBUG
+ printf("pm_ioport_readb addr=%x\n", addr);
+#endif
+ return 0;
+}
+
+static void pm_ioport_writeb(void *opaque, uint32_t addr, uint32_t val)
+{
+}
+
static void pm_ioport_writew(void *opaque, uint32_t addr, uint32_t val)
{
PIIX4PMState *s = opaque;
@@ -420,6 +445,8 @@ static void pm_io_space_update(PIIX4PMSt
#if defined(DEBUG)
printf("PM: mapping to 0x%x\n", pm_io_base);
#endif
+ register_ioport_write(pm_io_base, 64, 1, pm_ioport_writeb, s);
+ register_ioport_read(pm_io_base, 64, 1, pm_ioport_readb, s);
register_ioport_write(pm_io_base, 64, 2, pm_ioport_writew, s);
register_ioport_read(pm_io_base, 64, 2, pm_ioport_readw, s);
register_ioport_write(pm_io_base, 64, 4, pm_ioport_writel, s);
@@ -538,6 +565,7 @@ void qemu_system_powerdown(void)
}
#endif
#define GPE_BASE 0xafe0
+#define POWER_GPE_BASE 0xb040
#define PROC_BASE 0xaf00
#define PCI_BASE 0xae00
#define PCI_EJ_BASE 0xae08
@@ -554,7 +582,12 @@ struct pci_status {
uint32_t down;
};
+struct power_gpe_regs {
+ uint8_t cpus;
+};
+
static struct gpe_regs gpe;
+static struct power_gpe_regs power_gpe;
static struct pci_status pci0_status;
static uint32_t gpe_readb(void *opaque, uint32_t addr)
@@ -623,6 +656,23 @@ static void gpe_writeb(void *opaque, uin
#endif
}
+static uint32_t cpu_power_read(void *opaque, uint32_t addr)
+{
+ struct power_gpe_regs *p = opaque;
+
+#if defined(DEBUG)
+ printf("cpu power read %lx == %lx\n", addr, p->cpus);
+#endif
+ return p->cpus;
+}
+
+static void cpu_power_write(void *opaque, uint32_t addr, uint32_t val)
+{
+#if defined(DEBUG)
+ printf("cpu power write %lx <== %lx\n", addr, val);
+#endif
+}
+
static uint32_t pcihotplug_read(void *opaque, uint32_t addr)
{
uint32_t val = 0;
@@ -696,6 +746,9 @@ void qemu_system_hot_add_init(const char
register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, NULL);
register_ioport_read(PCI_EJ_BASE, 4, 4, pciej_read, NULL);
+ register_ioport_write(POWER_GPE_BASE, 4, 2, cpu_power_write, &power_gpe);
+ register_ioport_read(POWER_GPE_BASE, 4, 2, cpu_power_read, &power_gpe);
+
model = cpu_model;
}
@@ -738,6 +791,16 @@ void qemu_system_cpu_hot_add(int cpu, in
disable_processor(&gpe, cpu);
qemu_set_irq(pm_state->irq, 0);
}
+
+static void qemu_system_cpu_power_notify(int cpu)
+{
+ power_gpe.cpus = 0;
+
+ qemu_set_irq(pm_state->irq, 1);
+ power_gpe.cpus |= (1 << cpu);
+ qemu_set_irq(pm_state->irq, 0);
+}
+
#endif
static void enable_device(struct pci_status *p, struct gpe_regs *g, int slot)
--
^ permalink raw reply [flat|nested] 27+ messages in thread* [patch 07/12] libkvm: interface to KVM_SET_OPEN_IOPORT
2008-05-29 22:22 [patch 00/12] fake ACPI C2 emulation v2 Marcelo Tosatti
` (5 preceding siblings ...)
2008-05-29 22:22 ` [patch 06/12] QEMU/KVM: self-disabling C2 emulation Marcelo Tosatti
@ 2008-05-29 22:22 ` Marcelo Tosatti
2008-05-29 22:22 ` [patch 08/12] QEMU/KVM: non-virtualized ACPI PMTimer support Marcelo Tosatti
` (5 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2008-05-29 22:22 UTC (permalink / raw)
To: Avi Kivity
Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm,
Marcelo Tosatti
[-- Attachment #1: libkvm-open-ioports --]
[-- Type: text/plain, Size: 1217 bytes --]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm-userspace.realtip/libkvm/libkvm.c
===================================================================
--- kvm-userspace.realtip.orig/libkvm/libkvm.c
+++ kvm-userspace.realtip/libkvm/libkvm.c
@@ -680,7 +680,18 @@ int kvm_set_irqchip(kvm_context_t kvm, s
}
return r;
}
+#endif
+
+#ifdef KVM_CAP_OPEN_IOPORT
+int kvm_set_open_ioports(kvm_context_t kvm, struct kvm_ioport_list *ioport_list)
+{
+ int r;
+ r = ioctl(kvm->fd, KVM_CHECK_EXTENSION, KVM_CAP_OPEN_IOPORT);
+ if (r > 0)
+ return ioctl(kvm->vm_fd, KVM_SET_OPEN_IOPORT, ioport_list);
+ return -ENOSYS;
+}
#endif
static int handle_io(kvm_context_t kvm, struct kvm_run *run, int vcpu)
Index: kvm-userspace.realtip/libkvm/libkvm.h
===================================================================
--- kvm-userspace.realtip.orig/libkvm/libkvm.h
+++ kvm-userspace.realtip/libkvm/libkvm.h
@@ -599,6 +599,16 @@ int kvm_set_pit(kvm_context_t kvm, struc
#endif
+#ifdef KVM_CAP_OPEN_IOPORT
+
+/*!
+ * \brief Set direct io port access
+ */
+
+int kvm_set_open_ioports(kvm_context_t kvm, struct kvm_ioport_list *ioport_list);
+#endif
+
+
#ifdef KVM_CAP_VAPIC
/*!
--
^ permalink raw reply [flat|nested] 27+ messages in thread* [patch 08/12] QEMU/KVM: non-virtualized ACPI PMTimer support
2008-05-29 22:22 [patch 00/12] fake ACPI C2 emulation v2 Marcelo Tosatti
` (6 preceding siblings ...)
2008-05-29 22:22 ` [patch 07/12] libkvm: interface to KVM_SET_OPEN_IOPORT Marcelo Tosatti
@ 2008-05-29 22:22 ` Marcelo Tosatti
2008-05-29 22:22 ` [patch 09/12] libkvm: in-kernel ACPI pmtimer interface Marcelo Tosatti
` (4 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2008-05-29 22:22 UTC (permalink / raw)
To: Avi Kivity
Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm,
Marcelo Tosatti
[-- Attachment #1: acpi-pmtimer --]
[-- Type: text/plain, Size: 14743 bytes --]
QEMU support for direct pmtimer reads. Hopefully its safe, since its a
read-only register ?
With self-disable C2 + this I'm seeing less CPU usage when idle with
CONFIG_CPU_IDLE enabled. Quite noticeable on SMP guests. Windows XP is
comparable to standard (never seen it consume less than 10% either way,
usually 20-30%).
On migration the destination host can either lack ACPI or have the timer
in a different IO port, so emulation is necessary.
Or luckily the pmtimer is in the same address. Since the 24-bit counter
overflow period is only ~= 4.6 seconds, its probably worthwhile to wait
for synchronization before restarting the guest. Not implemented though.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm-userspace.realtip/bios/rombios32.c
===================================================================
--- kvm-userspace.realtip.orig/bios/rombios32.c
+++ kvm-userspace.realtip/bios/rombios32.c
@@ -391,7 +391,7 @@ uint8_t bios_uuid[16];
unsigned long ebda_cur_addr;
#endif
int acpi_enabled;
-uint32_t pm_io_base, smb_io_base;
+uint32_t pm_io_base, pmtmr_base, smb_io_base;
int pm_sci_int;
unsigned long bios_table_cur_addr;
unsigned long bios_table_end_addr;
@@ -819,6 +819,12 @@ static void pci_bios_init_device(PCIDevi
pci_config_writeb(d, PCI_INTERRUPT_LINE, 9);
pm_io_base = PM_IO_BASE;
+ pmtmr_base = cmos_readb(0x60);
+ pmtmr_base |= cmos_readb(0x61) << 8;
+ pmtmr_base |= cmos_readb(0x62) << 16;
+ pmtmr_base |= cmos_readb(0x63) << 24;
+ if (!pmtmr_base)
+ pmtmr_base = pm_io_base + 0x08;
pci_config_writel(d, 0x40, pm_io_base | 1);
pci_config_writeb(d, 0x80, 0x01); /* enable PM io space */
smb_io_base = SMB_IO_BASE;
@@ -1376,7 +1382,7 @@ void acpi_bios_init(void)
fadt->acpi_disable = 0xf0;
fadt->pm1a_evt_blk = cpu_to_le32(pm_io_base);
fadt->pm1a_cnt_blk = cpu_to_le32(pm_io_base + 0x04);
- fadt->pm_tmr_blk = cpu_to_le32(pm_io_base + 0x08);
+ fadt->pm_tmr_blk = cpu_to_le32(pmtmr_base);
fadt->pm1_evt_len = 4;
fadt->pm1_cnt_len = 2;
fadt->pm_tmr_len = 4;
Index: kvm-userspace.realtip/qemu/hw/acpi.c
===================================================================
--- kvm-userspace.realtip.orig/qemu/hw/acpi.c
+++ kvm-userspace.realtip/qemu/hw/acpi.c
@@ -40,6 +40,10 @@ typedef struct PIIX4PMState {
uint16_t pmsts;
uint16_t pmen;
uint16_t pmcntrl;
+ uint32_t pmtimer_base;
+ uint8_t direct_access;
+ int32_t pmtimer_offset;
+ uint32_t pmtimer_io_offset;
uint8_t apmc;
uint8_t apms;
QEMUTimer *tmr_timer;
@@ -81,7 +85,12 @@ PIIX4PMState *pm_state;
static uint32_t get_pmtmr(PIIX4PMState *s)
{
uint32_t d;
- d = muldiv64(qemu_get_clock(vm_clock), PM_FREQ, ticks_per_sec);
+ if (!s->direct_access) {
+ d = muldiv64(qemu_get_clock(vm_clock), PM_FREQ, ticks_per_sec);
+ d += s->pmtimer_offset;
+ } else
+ qemu_kvm_get_pmtimer(&d);
+
return d & 0xffffff;
}
@@ -235,14 +244,10 @@ static uint32_t pm_ioport_readl(void *op
uint32_t val;
addr &= 0x3f;
- switch(addr) {
- case 0x08:
+ if (addr == s->pmtimer_io_offset)
val = get_pmtmr(s);
- break;
- default:
+ else
val = 0;
- break;
- }
#ifdef DEBUG
printf("PM readl port=0x%04x val=0x%08x\n", addr, val);
#endif
@@ -433,9 +438,9 @@ static uint32_t smb_ioport_readb(void *o
return val;
}
-static void pm_io_space_update(PIIX4PMState *s)
+static void pm_io_space_update(PIIX4PMState *s, int migration)
{
- uint32_t pm_io_base;
+ uint32_t pm_io_base, pmtmr_len;
if (s->dev.config[0x80] & 1) {
pm_io_base = le32_to_cpu(*(uint32_t *)(s->dev.config + 0x40));
@@ -443,14 +448,29 @@ static void pm_io_space_update(PIIX4PMSt
/* XXX: need to improve memory and ioport allocation */
#if defined(DEBUG)
- printf("PM: mapping to 0x%x\n", pm_io_base);
+ printf("PM: mapping to 0x%x mig=%d\n", pm_io_base, migration);
#endif
register_ioport_write(pm_io_base, 64, 1, pm_ioport_writeb, s);
register_ioport_read(pm_io_base, 64, 1, pm_ioport_readb, s);
register_ioport_write(pm_io_base, 64, 2, pm_ioport_writew, s);
register_ioport_read(pm_io_base, 64, 2, pm_ioport_readw, s);
- register_ioport_write(pm_io_base, 64, 4, pm_ioport_writel, s);
- register_ioport_read(pm_io_base, 64, 4, pm_ioport_readl, s);
+
+ if (migration) {
+ s->pmtimer_io_offset = 0x08;
+ pmtmr_len = 64;
+ } else if (host_pmtimer_base) {
+ s->pmtimer_base = host_pmtimer_base;
+ s->pmtimer_io_offset = 0x0;
+ pmtmr_len = 4;
+ s->direct_access = 1;
+ } else {
+ s->pmtimer_base = pm_io_base;
+ s->pmtimer_io_offset = 0x08;
+ pmtmr_len = 64;
+ }
+
+ register_ioport_write(s->pmtimer_base, pmtmr_len, 4, pm_ioport_writel, s);
+ register_ioport_read(s->pmtimer_base, pmtmr_len, 4, pm_ioport_readl, s);
}
}
@@ -459,12 +479,13 @@ static void pm_write_config(PCIDevice *d
{
pci_default_write_config(d, address, val, len);
if (address == 0x80)
- pm_io_space_update((PIIX4PMState *)d);
+ pm_io_space_update((PIIX4PMState *)d, 0);
}
static void pm_save(QEMUFile* f,void *opaque)
{
PIIX4PMState *s = opaque;
+ uint32_t pmtmr_val;
pci_device_save(&s->dev, f);
@@ -475,6 +496,14 @@ static void pm_save(QEMUFile* f,void *op
qemu_put_8s(f, &s->apms);
qemu_put_timer(f, s->tmr_timer);
qemu_put_be64(f, s->tmr_overflow_time);
+ qemu_put_be32(f, s->pmtimer_base);
+ if (s->direct_access) {
+ if (qemu_kvm_get_pmtimer(&pmtmr_val) < 0)
+ pmtmr_val = 1 << 30;
+ } else
+ pmtmr_val = get_pmtmr(s);
+
+ qemu_put_be32(f, pmtmr_val);
}
static int pm_load(QEMUFile* f,void* opaque,int version_id)
@@ -482,7 +511,7 @@ static int pm_load(QEMUFile* f,void* opa
PIIX4PMState *s = opaque;
int ret;
- if (version_id > 1)
+ if (version_id > 2)
return -EINVAL;
ret = pci_device_load(&s->dev, f);
@@ -496,10 +525,31 @@ static int pm_load(QEMUFile* f,void* opa
qemu_get_8s(f, &s->apms);
qemu_get_timer(f, s->tmr_timer);
s->tmr_overflow_time=qemu_get_be64(f);
+ if (version_id >= 2) {
+ uint32_t pmtmr_val;
- pm_io_space_update(s);
+ s->pmtimer_base = qemu_get_be32(f);
+ pmtmr_val = qemu_get_be32(f);
+ if (pmtmr_val & (1 << 30))
+ return -EINVAL;
+#ifdef KVM_CAP_OPEN_IOPORT
+ /*
+ * Could wait for synchronicity instead of closing
+ * direct access.
+ */
+ if (host_pmtimer_base) {
+ ret = kvm_close_direct_pmtimer();
+ if (ret)
+ return ret;
+ host_pmtimer_base = 0;
+ }
+#endif
+ s->pmtimer_offset = pmtmr_val - get_pmtmr(s);
+ }
- return 0;
+ pm_io_space_update(s, 1);
+
+ return 0;
}
i2c_bus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t smb_io_base,
@@ -548,7 +598,7 @@ i2c_bus *piix4_pm_init(PCIBus *bus, int
s->tmr_timer = qemu_new_timer(vm_clock, pm_tmr_timer, s);
- register_savevm("piix4_pm", 0, 1, pm_save, pm_load, s);
+ register_savevm("piix4_pm", 0, 2, pm_save, pm_load, s);
s->smbus = i2c_init_bus();
s->irq = sci_irq;
Index: kvm-userspace.realtip/qemu/hw/pc.c
===================================================================
--- kvm-userspace.realtip.orig/qemu/hw/pc.c
+++ kvm-userspace.realtip/qemu/hw/pc.c
@@ -253,6 +253,11 @@ static void cmos_init(ram_addr_t ram_siz
}
rtc_set_memory(s, 0x5f, smp_cpus - 1);
+ rtc_set_memory(s, 0x60, host_pmtimer_base);
+ rtc_set_memory(s, 0x61, host_pmtimer_base >> 8);
+ rtc_set_memory(s, 0x62, host_pmtimer_base >> 16);
+ rtc_set_memory(s, 0x63, host_pmtimer_base >> 24);
+
if (ram_size > (16 * 1024 * 1024))
val = (ram_size / 65536) - ((16 * 1024 * 1024) / 65536);
else
Index: kvm-userspace.realtip/qemu/qemu-kvm-x86.c
===================================================================
--- kvm-userspace.realtip.orig/qemu/qemu-kvm-x86.c
+++ kvm-userspace.realtip/qemu/qemu-kvm-x86.c
@@ -11,12 +11,17 @@
#include <string.h>
#include "hw/hw.h"
+#include "sysemu.h"
#include "qemu-kvm.h"
#include <libkvm.h>
#include <pthread.h>
#include <sys/utsname.h>
#include <linux/kvm_para.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
#define MSR_IA32_TSC 0x10
@@ -545,6 +550,123 @@ static int get_para_features(kvm_context
return features;
}
+#ifdef KVM_CAP_OPEN_IOPORT
+int kvm_arch_open_pmtimer(void)
+{
+ int fd, ret = 0;
+ char buf[16384];
+ char *line, *saveptr;
+ uint32_t pmtmr;
+ struct kvm_ioport_list *ioport_list;
+
+ if (no_direct_pmtimer)
+ return ret;
+
+ fd = open("/proc/ioports", O_RDONLY);
+ if (fd == -1) {
+ perror("open /proc/ioports");
+ exit(0);
+ }
+ ret = read(fd, buf, 16384);
+ if (ret == -1) {
+ perror("read /proc/ioports");
+ exit(0);
+ }
+
+ line = strtok_r(buf, "\n", &saveptr);
+ do {
+ char *pmstr;
+ line = pmstr = strtok_r(NULL, "\n", &saveptr);
+ if (pmstr && strstr(pmstr, "ACPI PM_TMR")) {
+ pmstr = strtok(line, "-");
+ while (*pmstr == ' ')
+ pmstr++;
+ host_pmtimer_base = strtoul(pmstr, NULL, 16);
+ /*
+ * Fail now instead of during migration
+ */
+ if (qemu_kvm_get_pmtimer(&pmtmr) < 0)
+ host_pmtimer_base = 0;
+ break;
+ }
+ } while (line);
+
+ if (!host_pmtimer_base)
+ return 0;
+
+ ioport_list = qemu_malloc(sizeof(struct kvm_ioport_list) +
+ sizeof(struct kvm_ioport) * 2);
+ if (!ioport_list)
+ goto out_no_pmtimer;
+ ioport_list->nranges = 2;
+ ioport_list->ioports[0].addr = 0x80;
+ ioport_list->ioports[0].len = 1;
+ ioport_list->ioports[1].addr = host_pmtimer_base;
+ ioport_list->ioports[1].len = 4;
+
+ ret = kvm_set_open_ioports(kvm_context, ioport_list);
+ if (ret) {
+ perror("kvm_set_open_ioports");
+ goto out_no_pmtimer_free;
+ }
+
+ qemu_free(ioport_list);
+ return 0;
+
+out_no_pmtimer_free:
+ qemu_free(ioport_list);
+out_no_pmtimer:
+ host_pmtimer_base = 0;
+ return 0;
+}
+
+int kvm_close_direct_pmtimer(void)
+{
+ struct kvm_ioport_list *ioport_list;
+ int ret;
+
+ ioport_list = qemu_malloc(sizeof(struct kvm_ioport_list) +
+ sizeof(struct kvm_ioport));
+ if (!ioport_list)
+ return -EINVAL;
+ ioport_list->nranges = 1;
+ ioport_list->ioports[0].addr = 0x80;
+ ioport_list->ioports[0].len = 1;
+
+ ret = kvm_set_open_ioports(kvm_context, ioport_list);
+
+ qemu_free(ioport_list);
+ return ret;
+}
+#else
+int kvm_arch_open_pmtimer(void)
+{
+ return 0;
+}
+#endif
+
+int kvm_arch_qemu_init(void)
+{
+ kvm_arch_open_pmtimer();
+ return 0;
+}
+
+int qemu_kvm_get_pmtimer(uint32_t *value)
+{
+ int fd, ret;
+
+ fd = open("/dev/pmtimer", O_RDONLY);
+ if (fd == -1)
+ return -1;
+
+ ret = read(fd, value, sizeof(value));
+ close(fd);
+
+ *value &= 0xffffff;
+
+ return ret;
+}
+
int kvm_arch_qemu_init_env(CPUState *cenv)
{
struct kvm_cpuid_entry cpuid_ent[100];
Index: kvm-userspace.realtip/qemu/qemu-kvm.c
===================================================================
--- kvm-userspace.realtip.orig/qemu/qemu-kvm.c
+++ kvm-userspace.realtip/qemu/qemu-kvm.c
@@ -677,6 +677,7 @@ int kvm_qemu_create_context(void)
r = kvm_arch_qemu_create_context();
if(r <0)
kvm_qemu_destroy();
+ kvm_arch_qemu_init();
return 0;
}
Index: kvm-userspace.realtip/qemu/qemu-kvm.h
===================================================================
--- kvm-userspace.realtip.orig/qemu/qemu-kvm.h
+++ kvm-userspace.realtip/qemu/qemu-kvm.h
@@ -49,6 +49,7 @@ void kvm_cpu_destroy_phys_mem(target_phy
unsigned long size);
int kvm_arch_qemu_create_context(void);
+int kvm_arch_qemu_init(void);
void kvm_arch_save_regs(CPUState *env);
void kvm_arch_load_regs(CPUState *env);
@@ -60,6 +61,8 @@ int kvm_arch_has_work(CPUState *env);
int kvm_arch_try_push_interrupts(void *opaque);
void kvm_arch_update_regs_for_sipi(CPUState *env);
void kvm_arch_cpu_reset(CPUState *env);
+int qemu_kvm_get_pmtimer(uint32_t *value);
+int kvm_close_direct_pmtimer(void);
CPUState *qemu_kvm_cpu_env(int index);
Index: kvm-userspace.realtip/qemu/sysemu.h
===================================================================
--- kvm-userspace.realtip.orig/qemu/sysemu.h
+++ kvm-userspace.realtip/qemu/sysemu.h
@@ -94,6 +94,7 @@ extern int win2k_install_hack;
extern int alt_grab;
extern int usb_enabled;
extern int smp_cpus;
+extern unsigned int host_pmtimer_base;
extern int cursor_hide;
extern int graphic_rotate;
extern int no_quit;
@@ -101,6 +102,7 @@ extern int semihosting_enabled;
extern int autostart;
extern int old_param;
extern int hpagesize;
+extern int no_direct_pmtimer;
extern const char *bootp_filename;
Index: kvm-userspace.realtip/qemu/vl.c
===================================================================
--- kvm-userspace.realtip.orig/qemu/vl.c
+++ kvm-userspace.realtip/qemu/vl.c
@@ -209,6 +209,7 @@ int win2k_install_hack = 0;
int usb_enabled = 0;
static VLANState *first_vlan;
int smp_cpus = 1;
+unsigned int host_pmtimer_base;
const char *vnc_display;
#if defined(TARGET_SPARC)
#define MAX_CPUS 16
@@ -235,6 +236,7 @@ int time_drift_fix = 0;
unsigned int kvm_shadow_memory = 0;
const char *mem_path = NULL;
int hpagesize = 0;
+int no_direct_pmtimer = 0;
const char *cpu_vendor_string;
#ifdef TARGET_ARM
int old_param = 0;
@@ -7931,6 +7933,7 @@ enum {
QEMU_OPTION_tdf,
QEMU_OPTION_kvm_shadow_memory,
QEMU_OPTION_mempath,
+ QEMU_OPTION_no_direct_pmtimer,
};
typedef struct QEMUOption {
@@ -8058,6 +8061,7 @@ const QEMUOption qemu_options[] = {
{ "clock", HAS_ARG, QEMU_OPTION_clock },
{ "startdate", HAS_ARG, QEMU_OPTION_startdate },
{ "mem-path", HAS_ARG, QEMU_OPTION_mempath },
+ { "no-direct-pmtimer", 0, QEMU_OPTION_no_direct_pmtimer },
{ NULL },
};
@@ -8962,6 +8966,9 @@ int main(int argc, char **argv)
case QEMU_OPTION_mempath:
mem_path = optarg;
break;
+ case QEMU_OPTION_no_direct_pmtimer:
+ no_direct_pmtimer = 1;
+ break;
case QEMU_OPTION_name:
qemu_name = optarg;
break;
--
^ permalink raw reply [flat|nested] 27+ messages in thread* [patch 09/12] libkvm: in-kernel ACPI pmtimer interface
2008-05-29 22:22 [patch 00/12] fake ACPI C2 emulation v2 Marcelo Tosatti
` (7 preceding siblings ...)
2008-05-29 22:22 ` [patch 08/12] QEMU/KVM: non-virtualized ACPI PMTimer support Marcelo Tosatti
@ 2008-05-29 22:22 ` Marcelo Tosatti
2008-05-29 22:22 ` [patch 10/12] QEMU/KVM: add option to disable in-kernel pmtimer emulation Marcelo Tosatti
` (3 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2008-05-29 22:22 UTC (permalink / raw)
To: Avi Kivity
Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm,
Marcelo Tosatti
[-- Attachment #1: inkernel-acpi-timer --]
[-- Type: text/plain, Size: 3295 bytes --]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm-userspace.realtip/libkvm/kvm-common.h
===================================================================
--- kvm-userspace.realtip.orig/libkvm/kvm-common.h
+++ kvm-userspace.realtip/libkvm/kvm-common.h
@@ -49,6 +49,10 @@ struct kvm_context {
int no_pit_creation;
/// in-kernel pit status
int pit_in_kernel;
+ // do not create in-kernel acpi timer if set
+ int no_acpi_timer_creation;
+ // in-kernel acpi timer status
+ int acpi_timer_in_kernel;
};
void init_slots(void);
Index: kvm-userspace.realtip/libkvm/libkvm-x86.c
===================================================================
--- kvm-userspace.realtip.orig/libkvm/libkvm-x86.c
+++ kvm-userspace.realtip/libkvm/libkvm-x86.c
@@ -166,6 +166,29 @@ int kvm_create_pit(kvm_context_t kvm)
return 0;
}
+int kvm_create_acpi_timer(kvm_context_t kvm)
+{
+#ifdef KVM_CAP_ACPI_TIMER
+ int r;
+
+ kvm->acpi_timer_in_kernel = 0;
+ if (!kvm->no_acpi_timer_creation) {
+ r = ioctl(kvm->fd, KVM_CHECK_EXTENSION, KVM_CAP_ACPI_TIMER);
+ if (r > 0) {
+ r = ioctl(kvm->vm_fd, KVM_CREATE_ACPI_TIMER);
+ if (r >= 0)
+ kvm->acpi_timer_in_kernel = 1;
+ else {
+ fprintf(stderr,
+ "Create kernel pmtimer failed\n");
+ return r;
+ }
+ }
+ }
+#endif
+ return 0;
+}
+
int kvm_arch_create(kvm_context_t kvm, unsigned long phys_mem_bytes,
void **vm_mem)
{
@@ -179,6 +202,10 @@ int kvm_arch_create(kvm_context_t kvm, u
if (r < 0)
return r;
+ r = kvm_create_acpi_timer(kvm);
+ if (r < 0)
+ return r;
+
return 0;
}
Index: kvm-userspace.realtip/libkvm/libkvm.c
===================================================================
--- kvm-userspace.realtip.orig/libkvm/libkvm.c
+++ kvm-userspace.realtip/libkvm/libkvm.c
@@ -277,6 +277,11 @@ void kvm_disable_pit_creation(kvm_contex
kvm->no_pit_creation = 1;
}
+void kvm_disable_acpi_timer_creation(kvm_context_t kvm)
+{
+ kvm->no_acpi_timer_creation = 1;
+}
+
int kvm_create_vcpu(kvm_context_t kvm, int slot)
{
long mmap_size;
@@ -994,3 +999,8 @@ int kvm_pit_in_kernel(kvm_context_t kvm)
{
return kvm->pit_in_kernel;
}
+
+int kvm_acpi_timer_in_kernel(kvm_context_t kvm)
+{
+ return kvm->acpi_timer_in_kernel;
+}
Index: kvm-userspace.realtip/libkvm/libkvm.h
===================================================================
--- kvm-userspace.realtip.orig/libkvm/libkvm.h
+++ kvm-userspace.realtip/libkvm/libkvm.h
@@ -117,6 +117,16 @@ void kvm_disable_irqchip_creation(kvm_co
void kvm_disable_pit_creation(kvm_context_t kvm);
/*!
+ * \brief Disable the in-kernel ACPI timer creation
+ *
+ * In-kernel acpi timer is enabled by default. If userspace acpi timer is to be used,
+ * this should be called prior to kvm_create().
+ *
+ * \param kvm Pointer to the kvm_context
+ */
+void kvm_disable_acpi_timer_creation(kvm_context_t kvm);
+
+/*!
* \brief Create new virtual machine
*
* This creates a new virtual machine, maps physical RAM to it, and creates a
@@ -599,6 +609,13 @@ int kvm_set_pit(kvm_context_t kvm, struc
#endif
+/*!
+ * \brief Query whether in kernel acpi timer is used
+ *
+ * \param kvm Pointer to the current kvm_context
+ */
+int kvm_acpi_timer_in_kernel(kvm_context_t kvm);
+
#ifdef KVM_CAP_OPEN_IOPORT
/*!
--
^ permalink raw reply [flat|nested] 27+ messages in thread* [patch 10/12] QEMU/KVM: add option to disable in-kernel pmtimer emulation
2008-05-29 22:22 [patch 00/12] fake ACPI C2 emulation v2 Marcelo Tosatti
` (8 preceding siblings ...)
2008-05-29 22:22 ` [patch 09/12] libkvm: in-kernel ACPI pmtimer interface Marcelo Tosatti
@ 2008-05-29 22:22 ` Marcelo Tosatti
2008-05-29 22:23 ` [patch 11/12] libkvm: interface for pmtimer save/restore Marcelo Tosatti
` (2 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2008-05-29 22:22 UTC (permalink / raw)
To: Avi Kivity
Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm,
Marcelo Tosatti
[-- Attachment #1: disable-inkernel-acpi-timer --]
[-- Type: text/plain, Size: 2908 bytes --]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm-userspace.realtip/qemu/qemu-kvm.c
===================================================================
--- kvm-userspace.realtip.orig/qemu/qemu-kvm.c
+++ kvm-userspace.realtip/qemu/qemu-kvm.c
@@ -11,6 +11,7 @@
int kvm_allowed = 1;
int kvm_irqchip = 1;
int kvm_pit = 1;
+int kvm_acpi_timer = 1;
#include <assert.h>
#include <string.h>
@@ -670,6 +671,9 @@ int kvm_qemu_create_context(void)
if (!kvm_pit) {
kvm_disable_pit_creation(kvm_context);
}
+ if (!kvm_acpi_timer) {
+ kvm_disable_acpi_timer_creation(kvm_context);
+ }
if (kvm_create(kvm_context, phys_ram_size, (void**)&phys_ram_base) < 0) {
kvm_qemu_destroy();
return -1;
Index: kvm-userspace.realtip/qemu/qemu-kvm.h
===================================================================
--- kvm-userspace.realtip.orig/qemu/qemu-kvm.h
+++ kvm-userspace.realtip/qemu/qemu-kvm.h
@@ -111,10 +111,12 @@ extern kvm_context_t kvm_context;
#define kvm_enabled() (kvm_allowed)
#define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context)
#define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context)
+#define qemu_kvm_acpi_timer_in_kernel() kvm_acpi_timer_in_kernel(kvm_context)
#else
#define kvm_enabled() (0)
#define qemu_kvm_irqchip_in_kernel() (0)
#define qemu_kvm_pit_in_kernel() (0)
+#define qemu_kvm_acpi_timer_in_kernel() (0)
#endif
void kvm_mutex_unlock(void);
Index: kvm-userspace.realtip/qemu/vl.c
===================================================================
--- kvm-userspace.realtip.orig/qemu/vl.c
+++ kvm-userspace.realtip/qemu/vl.c
@@ -7792,6 +7792,7 @@ static void help(int exitcode)
#endif
"-no-kvm-irqchip disable KVM kernel mode PIC/IOAPIC/LAPIC\n"
"-no-kvm-pit disable KVM kernel mode PIT\n"
+ "-no-kvm-acpi-timer disable KVM kernel mode ACPI timer\n"
#endif
#ifdef TARGET_I386
"-std-vga simulate a standard VGA card with VESA Bochs Extensions\n"
@@ -7916,6 +7917,7 @@ enum {
QEMU_OPTION_no_kvm,
QEMU_OPTION_no_kvm_irqchip,
QEMU_OPTION_no_kvm_pit,
+ QEMU_OPTION_no_kvm_acpi_timer,
QEMU_OPTION_no_reboot,
QEMU_OPTION_no_shutdown,
QEMU_OPTION_show_cursor,
@@ -8005,6 +8007,7 @@ const QEMUOption qemu_options[] = {
#endif
{ "no-kvm-irqchip", 0, QEMU_OPTION_no_kvm_irqchip },
{ "no-kvm-pit", 0, QEMU_OPTION_no_kvm_pit },
+ { "no-kvm-acpi-timer", 0, QEMU_OPTION_no_kvm_acpi_timer },
#endif
#if defined(TARGET_PPC) || defined(TARGET_SPARC)
{ "g", 1, QEMU_OPTION_g },
@@ -8908,6 +8911,11 @@ int main(int argc, char **argv)
kvm_pit = 0;
break;
}
+ case QEMU_OPTION_no_kvm_acpi_timer: {
+ extern int kvm_acpi_timer;
+ kvm_acpi_timer = 0;
+ break;
+ }
#endif
case QEMU_OPTION_usb:
usb_enabled = 1;
--
^ permalink raw reply [flat|nested] 27+ messages in thread* [patch 11/12] libkvm: interface for pmtimer save/restore
2008-05-29 22:22 [patch 00/12] fake ACPI C2 emulation v2 Marcelo Tosatti
` (9 preceding siblings ...)
2008-05-29 22:22 ` [patch 10/12] QEMU/KVM: add option to disable in-kernel pmtimer emulation Marcelo Tosatti
@ 2008-05-29 22:23 ` Marcelo Tosatti
2008-05-29 22:23 ` [patch 12/12] QEMU/KVM: in-kernel pmtimer save/restore support Marcelo Tosatti
2008-06-01 9:21 ` [patch 00/12] fake ACPI C2 emulation v2 Avi Kivity
12 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2008-05-29 22:23 UTC (permalink / raw)
To: Avi Kivity
Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm,
Marcelo Tosatti
[-- Attachment #1: inkernel-acpi-migration --]
[-- Type: text/plain, Size: 1954 bytes --]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm-userspace.realtip/libkvm/libkvm-x86.c
===================================================================
--- kvm-userspace.realtip.orig/libkvm/libkvm-x86.c
+++ kvm-userspace.realtip/libkvm/libkvm-x86.c
@@ -418,6 +418,37 @@ int kvm_set_pit(kvm_context_t kvm, struc
#endif
+#ifdef KVM_CAP_ACPI_TIMER
+
+int kvm_get_acpi_timer(kvm_context_t kvm, struct kvm_acpi_timer_state *s)
+{
+ int r;
+ if (!kvm->acpi_timer_in_kernel)
+ return 0;
+ r = ioctl(kvm->vm_fd, KVM_GET_ACPI_TIMER, s);
+ if (r != 0) {
+ r = -errno;
+ perror("kvm_get_acpi_timer");
+ }
+ return r;
+}
+
+int kvm_set_acpi_timer(kvm_context_t kvm, struct kvm_acpi_timer_state *s)
+{
+ int r;
+ if (!kvm->acpi_timer_in_kernel)
+ return 0;
+ r = ioctl(kvm->vm_fd, KVM_SET_ACPI_TIMER, s);
+ if (r != 0) {
+ r = -errno;
+ perror("kvm_set_acpi_timer");
+ }
+
+ return r;
+}
+
+#endif
+
void kvm_show_code(kvm_context_t kvm, int vcpu)
{
#define SHOW_CODE_LEN 50
Index: kvm-userspace.realtip/libkvm/libkvm.h
===================================================================
--- kvm-userspace.realtip.orig/libkvm/libkvm.h
+++ kvm-userspace.realtip/libkvm/libkvm.h
@@ -616,6 +616,28 @@ int kvm_set_pit(kvm_context_t kvm, struc
*/
int kvm_acpi_timer_in_kernel(kvm_context_t kvm);
+#ifdef KVM_CAP_ACPI_TIMER
+
+#if defined(__i386__) || defined(__x86_64__)
+/*!
+ * \brief Get in kernel ACPI timer of the virtual domain
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param s ACPI timer state of the virtual domain
+ */
+int kvm_get_acpi_timer(kvm_context_t kvm, struct kvm_acpi_timer_state *s);
+
+/*!
+ * \brief Set in kernel ACPI timer of the virtual domain
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param s ACPI timer state of the virtual domain
+ */
+int kvm_set_acpi_timer(kvm_context_t kvm, struct kvm_acpi_timer_state *s);
+#endif
+
+#endif
+
#ifdef KVM_CAP_OPEN_IOPORT
/*!
--
^ permalink raw reply [flat|nested] 27+ messages in thread* [patch 12/12] QEMU/KVM: in-kernel pmtimer save/restore support
2008-05-29 22:22 [patch 00/12] fake ACPI C2 emulation v2 Marcelo Tosatti
` (10 preceding siblings ...)
2008-05-29 22:23 ` [patch 11/12] libkvm: interface for pmtimer save/restore Marcelo Tosatti
@ 2008-05-29 22:23 ` Marcelo Tosatti
2008-06-01 9:21 ` [patch 00/12] fake ACPI C2 emulation v2 Avi Kivity
12 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2008-05-29 22:23 UTC (permalink / raw)
To: Avi Kivity
Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm,
Marcelo Tosatti
[-- Attachment #1: inkernel-acpi-migration-2 --]
[-- Type: text/plain, Size: 1882 bytes --]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Index: kvm-userspace.realtip/qemu/hw/acpi.c
===================================================================
--- kvm-userspace.realtip.orig/qemu/hw/acpi.c
+++ kvm-userspace.realtip/qemu/hw/acpi.c
@@ -86,8 +86,16 @@ static uint32_t get_pmtmr(PIIX4PMState *
{
uint32_t d;
if (!s->direct_access) {
- d = muldiv64(qemu_get_clock(vm_clock), PM_FREQ, ticks_per_sec);
- d += s->pmtimer_offset;
+ if (qemu_kvm_acpi_timer_in_kernel()) {
+#ifdef KVM_CAP_ACPI_TIMER
+ struct kvm_acpi_timer_state acpi_state;
+ kvm_get_acpi_timer(kvm_context, &acpi_state);
+ return acpi_state.timer_val;
+#endif
+ } else {
+ d = muldiv64(qemu_get_clock(vm_clock), PM_FREQ, ticks_per_sec);
+ d += s->pmtimer_offset;
+ }
} else
qemu_kvm_get_pmtimer(&d);
@@ -482,6 +490,22 @@ static void pm_write_config(PCIDevice *d
pm_io_space_update((PIIX4PMState *)d, 0);
}
+#ifdef KVM_CAP_ACPI_TIMER
+static int kvm_restore_acpi_timer(uint32_t pmtmr_val, uint32_t pmtimer_base)
+{
+ struct kvm_acpi_timer_state acpi_state;
+ acpi_state.timer_val = pmtmr_val;
+ acpi_state.base_address = pmtimer_base;
+
+ return kvm_set_acpi_timer(kvm_context, &acpi_state);
+}
+#else
+static int kvm_restore_acpi_timer(uint32_t pmtmr_val, uint32_t pmtimer_base)
+{
+ return -EINVAL;
+}
+#endif
+
static void pm_save(QEMUFile* f,void *opaque)
{
PIIX4PMState *s = opaque;
@@ -544,6 +568,12 @@ static int pm_load(QEMUFile* f,void* opa
host_pmtimer_base = 0;
}
#endif
+ if (qemu_kvm_acpi_timer_in_kernel()) {
+ ret = kvm_restore_acpi_timer(pmtmr_val, s->pmtimer_base);
+ if (ret)
+ return ret;
+ }
+
s->pmtimer_offset = pmtmr_val - get_pmtmr(s);
}
--
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [patch 00/12] fake ACPI C2 emulation v2
2008-05-29 22:22 [patch 00/12] fake ACPI C2 emulation v2 Marcelo Tosatti
` (11 preceding siblings ...)
2008-05-29 22:23 ` [patch 12/12] QEMU/KVM: in-kernel pmtimer save/restore support Marcelo Tosatti
@ 2008-06-01 9:21 ` Avi Kivity
2008-06-02 16:08 ` Marcelo Tosatti
12 siblings, 1 reply; 27+ messages in thread
From: Avi Kivity @ 2008-06-01 9:21 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm
Marcelo Tosatti wrote:
> Addressing comments on the previous patchset, follows:
>
> - Same fake C2 emulation
> - /dev/pmtimer
> - Support for multiple IO bitmap pages + userspace interface
> - In-kernel ACPI pmtimer emulation
>
> Tested with Linux and WinXP guests. Also tested migration.
>
Do you have any performance numbers, comparing qemu/kernel/passthrough?
[Real review will be delayed as I am travelling; will try to do as much
as I can]
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [patch 00/12] fake ACPI C2 emulation v2
2008-06-01 9:21 ` [patch 00/12] fake ACPI C2 emulation v2 Avi Kivity
@ 2008-06-02 16:08 ` Marcelo Tosatti
2008-06-04 10:49 ` Avi Kivity
0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2008-06-02 16:08 UTC (permalink / raw)
To: Avi Kivity; +Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm
On Sun, Jun 01, 2008 at 12:21:29PM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> Addressing comments on the previous patchset, follows:
>>
>> - Same fake C2 emulation
>> - /dev/pmtimer
>> - Support for multiple IO bitmap pages + userspace interface
>> - In-kernel ACPI pmtimer emulation
>>
>> Tested with Linux and WinXP guests. Also tested migration.
>>
>
> Do you have any performance numbers, comparing qemu/kernel/passthrough?
Test is 1 million gettimeofday calls, Xeon 1.60GHz with 4MB L2.
guest (qemu emulation):
cycles:1189759332
guest (in-kernel emulation):
cycles:628046412
guest (direct pmtimer):
cycles:230372934
host (TSC):
cycles:14862774
> [Real review will be delayed as I am travelling; will try to do as much as
> I can]
OK!
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/12] fake ACPI C2 emulation v2
2008-06-02 16:08 ` Marcelo Tosatti
@ 2008-06-04 10:49 ` Avi Kivity
2008-06-05 3:12 ` Marcelo Tosatti
0 siblings, 1 reply; 27+ messages in thread
From: Avi Kivity @ 2008-06-04 10:49 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm
Marcelo Tosatti wrote:
> On Sun, Jun 01, 2008 at 12:21:29PM +0300, Avi Kivity wrote:
>
>> Marcelo Tosatti wrote:
>>
>>> Addressing comments on the previous patchset, follows:
>>>
>>> - Same fake C2 emulation
>>> - /dev/pmtimer
>>> - Support for multiple IO bitmap pages + userspace interface
>>> - In-kernel ACPI pmtimer emulation
>>>
>>> Tested with Linux and WinXP guests. Also tested migration.
>>>
>>>
>> Do you have any performance numbers, comparing qemu/kernel/passthrough?
>>
>
> Test is 1 million gettimeofday calls, Xeon 1.60GHz with 4MB L2.
>
> guest (qemu emulation):
> cycles:1189759332
>
> guest (in-kernel emulation):
> cycles:628046412
>
> guest (direct pmtimer):
> cycles:230372934
>
> host (TSC):
> cycles:14862774
>
>
Ratio is 1:15:80
Looks like direct pmtimer is still quite slow. Are there any exits with
direct pmtimer, or is it all due to the ioport latency?
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/12] fake ACPI C2 emulation v2
2008-06-04 10:49 ` Avi Kivity
@ 2008-06-05 3:12 ` Marcelo Tosatti
2008-06-05 7:56 ` Avi Kivity
0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2008-06-05 3:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm
On Wed, Jun 04, 2008 at 01:49:41PM +0300, Avi Kivity wrote:
>>
>> Test is 1 million gettimeofday calls, Xeon 1.60GHz with 4MB L2.
>>
>> guest (qemu emulation):
>> cycles:1189759332
>>
>> guest (in-kernel emulation):
>> cycles:628046412
>>
>> guest (direct pmtimer):
>> cycles:230372934
>>
>> host (TSC):
>> cycles:14862774
>>
>
> Ratio is 1:15:80
>
> Looks like direct pmtimer is still quite slow. Are there any exits with
> direct pmtimer, or is it all due to the ioport latency?
host (pmtimer):
cycles:225768390
So its getting close-to-native performance. As you mentioned earlier,
acpi_pm can't benefit from vsyscalls.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/12] fake ACPI C2 emulation v2
2008-06-05 3:12 ` Marcelo Tosatti
@ 2008-06-05 7:56 ` Avi Kivity
0 siblings, 0 replies; 27+ messages in thread
From: Avi Kivity @ 2008-06-05 7:56 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Chris Wright, Glauber Costa, Anthony Liguori, kvm
Marcelo Tosatti wrote:
> On Wed, Jun 04, 2008 at 01:49:41PM +0300, Avi Kivity wrote:
>
>>> Test is 1 million gettimeofday calls, Xeon 1.60GHz with 4MB L2.
>>>
>>> guest (qemu emulation):
>>> cycles:1189759332
>>>
>>> guest (in-kernel emulation):
>>> cycles:628046412
>>>
>>> guest (direct pmtimer):
>>> cycles:230372934
>>>
>>> host (TSC):
>>> cycles:14862774
>>>
>>>
>> Ratio is 1:15:80
>>
>> Looks like direct pmtimer is still quite slow. Are there any exits with
>> direct pmtimer, or is it all due to the ioport latency?
>>
>
> host (pmtimer):
> cycles:225768390
>
> So its getting close-to-native performance. As you mentioned earlier,
> acpi_pm can't benefit from vsyscalls.
>
>
Yes, but the host will use tsc (on a modern host with a stable tsc).
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 27+ messages in thread