* [PATCHv2 1/7] apic: fix typo EIO_ACK -> EOI_ACK and document
[not found] <cover.1336679924.git.mst@redhat.com>
@ 2012-05-11 7:38 ` Michael S. Tsirkin
2012-05-11 7:38 ` [PATCHv2 2/7] apic: use symbolic APIC_EOI_ACK Michael S. Tsirkin
` (6 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2012-05-11 7:38 UTC (permalink / raw)
To: x86, kvm
Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
Linus Torvalds, linux-kernel
Fix typo in the macro name and document the
reason it has this value. Update users.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
arch/x86/include/asm/apicdef.h | 2 +-
arch/x86/platform/visws/visws_quirks.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index 134bba0..c46bb99 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -37,7 +37,7 @@
#define APIC_ARBPRI_MASK 0xFFu
#define APIC_PROCPRI 0xA0
#define APIC_EOI 0xB0
-#define APIC_EIO_ACK 0x0
+#define APIC_EOI_ACK 0x0 /* Docs say 0 for future compat. */
#define APIC_RRR 0xC0
#define APIC_LDR 0xD0
#define APIC_LDR_MASK (0xFFu << 24)
diff --git a/arch/x86/platform/visws/visws_quirks.c b/arch/x86/platform/visws/visws_quirks.c
index c7abf13..94d8a39 100644
--- a/arch/x86/platform/visws/visws_quirks.c
+++ b/arch/x86/platform/visws/visws_quirks.c
@@ -445,7 +445,7 @@ static void ack_cobalt_irq(struct irq_data *data)
spin_lock_irqsave(&cobalt_lock, flags);
disable_cobalt_irq(data);
- apic_write(APIC_EOI, APIC_EIO_ACK);
+ apic_write(APIC_EOI, APIC_EOI_ACK);
spin_unlock_irqrestore(&cobalt_lock, flags);
}
--
MST
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCHv2 2/7] apic: use symbolic APIC_EOI_ACK
[not found] <cover.1336679924.git.mst@redhat.com>
2012-05-11 7:38 ` [PATCHv2 1/7] apic: fix typo EIO_ACK -> EOI_ACK and document Michael S. Tsirkin
@ 2012-05-11 7:38 ` Michael S. Tsirkin
2012-05-11 7:38 ` [PATCHv2 3/7] x86: add apic->eoi_write callback Michael S. Tsirkin
` (5 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2012-05-11 7:38 UTC (permalink / raw)
To: x86, kvm
Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
Linus Torvalds, linux-kernel
Use the symbol instead of hard-coded numbers,
now that the reason for the value is documented
where the constant is defined we don't need to
duplicate this explanation in code.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
arch/x86/include/asm/apic.h | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index d854101..a09e9ab 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -463,9 +463,7 @@ static inline void ack_APIC_irq(void)
* ack_APIC_irq() actually gets compiled as a single instruction
* ... yummie.
*/
-
- /* Docs say use 0 for future compatibility */
- apic_write(APIC_EOI, 0);
+ apic_write(APIC_EOI, APIC_EOI_ACK);
}
static inline unsigned default_get_apic_id(unsigned long x)
--
MST
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCHv2 3/7] x86: add apic->eoi_write callback
[not found] <cover.1336679924.git.mst@redhat.com>
2012-05-11 7:38 ` [PATCHv2 1/7] apic: fix typo EIO_ACK -> EOI_ACK and document Michael S. Tsirkin
2012-05-11 7:38 ` [PATCHv2 2/7] apic: use symbolic APIC_EOI_ACK Michael S. Tsirkin
@ 2012-05-11 7:38 ` Michael S. Tsirkin
2012-05-11 7:38 ` [PATCHv2 4/7] x86: eoi micro-optimization Michael S. Tsirkin
` (4 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2012-05-11 7:38 UTC (permalink / raw)
To: x86, kvm
Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
Linus Torvalds, linux-kernel
Add eoi_write callback so that kvm can override
eoi accesses without touching the rest of the apic.
As a side-effect, this will enable a micro-optimization
for apics using msr.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
arch/x86/include/asm/apic.h | 15 ++++++++++++++-
arch/x86/kernel/apic/apic_flat_64.c | 2 ++
arch/x86/kernel/apic/apic_noop.c | 1 +
arch/x86/kernel/apic/apic_numachip.c | 1 +
arch/x86/kernel/apic/bigsmp_32.c | 1 +
arch/x86/kernel/apic/es7000_32.c | 2 ++
arch/x86/kernel/apic/numaq_32.c | 1 +
arch/x86/kernel/apic/probe_32.c | 1 +
arch/x86/kernel/apic/summit_32.c | 1 +
arch/x86/kernel/apic/x2apic_cluster.c | 1 +
arch/x86/kernel/apic/x2apic_phys.c | 1 +
arch/x86/kernel/apic/x2apic_uv_x.c | 1 +
12 files changed, 27 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index a09e9ab..74efb8d 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -351,6 +351,13 @@ struct apic {
/* apic ops */
u32 (*read)(u32 reg);
void (*write)(u32 reg, u32 v);
+ /*
+ * eoi_write has the same signature as write.
+ * Drivers can support both eoi_write and write by passing the same
+ * callback value. Kernel can override eoi_write and fall back
+ * on write for EOI.
+ */
+ void (*eoi_write)(u32 reg, u32 v);
u64 (*icr_read)(void);
void (*icr_write)(u32 low, u32 high);
void (*wait_icr_idle)(void);
@@ -426,6 +433,11 @@ static inline void apic_write(u32 reg, u32 val)
apic->write(reg, val);
}
+static inline void apic_eoi(void)
+{
+ apic->eoi_write(APIC_EOI, APIC_EOI_ACK);
+}
+
static inline u64 apic_icr_read(void)
{
return apic->icr_read();
@@ -450,6 +462,7 @@ static inline u32 safe_apic_wait_icr_idle(void)
static inline u32 apic_read(u32 reg) { return 0; }
static inline void apic_write(u32 reg, u32 val) { }
+static inline void apic_eoi(void) { }
static inline u64 apic_icr_read(void) { return 0; }
static inline void apic_icr_write(u32 low, u32 high) { }
static inline void apic_wait_icr_idle(void) { }
@@ -463,7 +476,7 @@ static inline void ack_APIC_irq(void)
* ack_APIC_irq() actually gets compiled as a single instruction
* ... yummie.
*/
- apic_write(APIC_EOI, APIC_EOI_ACK);
+ apic_eoi();
}
static inline unsigned default_get_apic_id(unsigned long x)
diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index 359b689..0e881c4 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -227,6 +227,7 @@ static struct apic apic_flat = {
.read = native_apic_mem_read,
.write = native_apic_mem_write,
+ .eoi_write = native_apic_mem_write,
.icr_read = native_apic_icr_read,
.icr_write = native_apic_icr_write,
.wait_icr_idle = native_apic_wait_icr_idle,
@@ -386,6 +387,7 @@ static struct apic apic_physflat = {
.read = native_apic_mem_read,
.write = native_apic_mem_write,
+ .eoi_write = native_apic_mem_write,
.icr_read = native_apic_icr_read,
.icr_write = native_apic_icr_write,
.wait_icr_idle = native_apic_wait_icr_idle,
diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c
index 634ae6c..a6e4c6e 100644
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -181,6 +181,7 @@ struct apic apic_noop = {
.read = noop_apic_read,
.write = noop_apic_write,
+ .eoi_write = noop_apic_write,
.icr_read = noop_apic_icr_read,
.icr_write = noop_apic_icr_write,
.wait_icr_idle = noop_apic_wait_icr_idle,
diff --git a/arch/x86/kernel/apic/apic_numachip.c b/arch/x86/kernel/apic/apic_numachip.c
index 23e7542..6ec6d5d 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -295,6 +295,7 @@ static struct apic apic_numachip __refconst = {
.read = native_apic_mem_read,
.write = native_apic_mem_write,
+ .eoi_write = native_apic_mem_write,
.icr_read = native_apic_icr_read,
.icr_write = native_apic_icr_write,
.wait_icr_idle = native_apic_wait_icr_idle,
diff --git a/arch/x86/kernel/apic/bigsmp_32.c b/arch/x86/kernel/apic/bigsmp_32.c
index 0cdec70..31fbdbf 100644
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -248,6 +248,7 @@ static struct apic apic_bigsmp = {
.read = native_apic_mem_read,
.write = native_apic_mem_write,
+ .eoi_write = native_apic_mem_write,
.icr_read = native_apic_icr_read,
.icr_write = native_apic_icr_write,
.wait_icr_idle = native_apic_wait_icr_idle,
diff --git a/arch/x86/kernel/apic/es7000_32.c b/arch/x86/kernel/apic/es7000_32.c
index e42d1d3b9..db4ab1b 100644
--- a/arch/x86/kernel/apic/es7000_32.c
+++ b/arch/x86/kernel/apic/es7000_32.c
@@ -678,6 +678,7 @@ static struct apic __refdata apic_es7000_cluster = {
.read = native_apic_mem_read,
.write = native_apic_mem_write,
+ .eoi_write = native_apic_mem_write,
.icr_read = native_apic_icr_read,
.icr_write = native_apic_icr_write,
.wait_icr_idle = native_apic_wait_icr_idle,
@@ -742,6 +743,7 @@ static struct apic __refdata apic_es7000 = {
.read = native_apic_mem_read,
.write = native_apic_mem_write,
+ .eoi_write = native_apic_mem_write,
.icr_read = native_apic_icr_read,
.icr_write = native_apic_icr_write,
.wait_icr_idle = native_apic_wait_icr_idle,
diff --git a/arch/x86/kernel/apic/numaq_32.c b/arch/x86/kernel/apic/numaq_32.c
index 00d2422..f00a68c 100644
--- a/arch/x86/kernel/apic/numaq_32.c
+++ b/arch/x86/kernel/apic/numaq_32.c
@@ -530,6 +530,7 @@ static struct apic __refdata apic_numaq = {
.read = native_apic_mem_read,
.write = native_apic_mem_write,
+ .eoi_write = native_apic_mem_write,
.icr_read = native_apic_icr_read,
.icr_write = native_apic_icr_write,
.wait_icr_idle = native_apic_wait_icr_idle,
diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
index ff2c1b9..1b291da 100644
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -142,6 +142,7 @@ static struct apic apic_default = {
.read = native_apic_mem_read,
.write = native_apic_mem_write,
+ .eoi_write = native_apic_mem_write,
.icr_read = native_apic_icr_read,
.icr_write = native_apic_icr_write,
.wait_icr_idle = native_apic_wait_icr_idle,
diff --git a/arch/x86/kernel/apic/summit_32.c b/arch/x86/kernel/apic/summit_32.c
index fea000b..659897c 100644
--- a/arch/x86/kernel/apic/summit_32.c
+++ b/arch/x86/kernel/apic/summit_32.c
@@ -546,6 +546,7 @@ static struct apic apic_summit = {
.read = native_apic_mem_read,
.write = native_apic_mem_write,
+ .eoi_write = native_apic_mem_write,
.icr_read = native_apic_icr_read,
.icr_write = native_apic_icr_write,
.wait_icr_idle = native_apic_wait_icr_idle,
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index 48f3103..a5baa78 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -260,6 +260,7 @@ static struct apic apic_x2apic_cluster = {
.read = native_apic_msr_read,
.write = native_apic_msr_write,
+ .eoi_write = native_apic_msr_write,
.icr_read = native_x2apic_icr_read,
.icr_write = native_x2apic_icr_write,
.wait_icr_idle = native_x2apic_wait_icr_idle,
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index 991e315..8340356 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -172,6 +172,7 @@ static struct apic apic_x2apic_phys = {
.read = native_apic_msr_read,
.write = native_apic_msr_write,
+ .eoi_write = native_apic_msr_write,
.icr_read = native_x2apic_icr_read,
.icr_write = native_x2apic_icr_write,
.wait_icr_idle = native_x2apic_wait_icr_idle,
diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index 87bfa69..5b0e3d0 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -404,6 +404,7 @@ static struct apic __refdata apic_x2apic_uv_x = {
.read = native_apic_msr_read,
.write = native_apic_msr_write,
+ .eoi_write = native_apic_msr_write,
.icr_read = native_x2apic_icr_read,
.icr_write = native_x2apic_icr_write,
.wait_icr_idle = native_x2apic_wait_icr_idle,
--
MST
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCHv2 4/7] x86: eoi micro-optimization
[not found] <cover.1336679924.git.mst@redhat.com>
` (2 preceding siblings ...)
2012-05-11 7:38 ` [PATCHv2 3/7] x86: add apic->eoi_write callback Michael S. Tsirkin
@ 2012-05-11 7:38 ` Michael S. Tsirkin
2012-05-11 7:38 ` [PATCHv2 5/7] kvm_para: guest side for eoi avoidance Michael S. Tsirkin
` (3 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2012-05-11 7:38 UTC (permalink / raw)
To: x86, kvm
Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
Linus Torvalds, linux-kernel
We know both register and value for eoi beforehand,
so there's no need to check it and no need to do math
to calculate the msr. Saves instructions/branches
on each EOI when using x2apic.
I'm not sure what kind of tests should one run
to check whether this patch is good for performance.
Some data below: in case it's insufficient,
this patch can be dropped from the series for now:
I looked at the objdump output to verify that the generated code
looks right and actually is shorter.
Some benchmark results below show a tiny
but measureable improvement. The tests were run on
an AMD box with 24 cpus.
- A clean kernel build after reboot shows
a tiny but measureable improvement in system time
which means lower CPU overhead (though not measureable
in total time - that is dominated by user time and fluctuates
too much):
linux# reboot -f
...
linux# make clean
linux# time make -j 64 LOCALVERSION= 2>&1 > /dev/null
Before:
real 2m52.244s
user 35m53.833s
sys 6m7.194s
After:
real 2m52.827s
user 35m48.916s
sys 6m2.305s
- perf micro-benchmarks seem to consistently show
a tiny improvement in total time as well but it's below
the confidence level of 3 std deviations:
# ./tools/perf/perf stat --sync --repeat 100 --null perf bench sched messaging
...
0.414666797 seconds time elapsed ( +- 1.29% )
Performance counter stats for 'perf bench sched messaging' (100 runs):
0.395370891 seconds time elapsed
( +- 1.04% )
# ./tools/perf/perf stat --sync --repeat 100 --null perf bench sched pipe -l 10000
0.307019664 seconds time elapsed
( +- 0.10% )
0.304738024 seconds time elapsed
( +- 0.08% )
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
arch/x86/include/asm/apic.h | 5 +++++
arch/x86/kernel/apic/x2apic_cluster.c | 2 +-
arch/x86/kernel/apic/x2apic_phys.c | 2 +-
arch/x86/kernel/apic/x2apic_uv_x.c | 2 +-
4 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 74efb8d..5eb6d56 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -138,6 +138,11 @@ static inline void native_apic_msr_write(u32 reg, u32 v)
wrmsr(APIC_BASE_MSR + (reg >> 4), v, 0);
}
+static inline void native_apic_msr_eoi_write(u32 reg, u32 v)
+{
+ wrmsr(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK, 0);
+}
+
static inline u32 native_apic_msr_read(u32 reg)
{
u64 msr;
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index a5baa78..ff35cff 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -260,7 +260,7 @@ static struct apic apic_x2apic_cluster = {
.read = native_apic_msr_read,
.write = native_apic_msr_write,
- .eoi_write = native_apic_msr_write,
+ .eoi_write = native_apic_msr_eoi_write,
.icr_read = native_x2apic_icr_read,
.icr_write = native_x2apic_icr_write,
.wait_icr_idle = native_x2apic_wait_icr_idle,
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index 8340356..c17e982 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -172,7 +172,7 @@ static struct apic apic_x2apic_phys = {
.read = native_apic_msr_read,
.write = native_apic_msr_write,
- .eoi_write = native_apic_msr_write,
+ .eoi_write = native_apic_msr_eoi_write,
.icr_read = native_x2apic_icr_read,
.icr_write = native_x2apic_icr_write,
.wait_icr_idle = native_x2apic_wait_icr_idle,
diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index 5b0e3d0..c6d03f7 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -404,7 +404,7 @@ static struct apic __refdata apic_x2apic_uv_x = {
.read = native_apic_msr_read,
.write = native_apic_msr_write,
- .eoi_write = native_apic_msr_write,
+ .eoi_write = native_apic_msr_eoi_write,
.icr_read = native_x2apic_icr_read,
.icr_write = native_x2apic_icr_write,
.wait_icr_idle = native_x2apic_wait_icr_idle,
--
MST
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCHv2 5/7] kvm_para: guest side for eoi avoidance
[not found] <cover.1336679924.git.mst@redhat.com>
` (3 preceding siblings ...)
2012-05-11 7:38 ` [PATCHv2 4/7] x86: eoi micro-optimization Michael S. Tsirkin
@ 2012-05-11 7:38 ` Michael S. Tsirkin
2012-05-11 7:38 ` [PATCHv2 6/7] x86/bitops: note on __test_and_clear_bit atomicity Michael S. Tsirkin
` (2 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2012-05-11 7:38 UTC (permalink / raw)
To: x86, kvm
Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
Linus Torvalds, linux-kernel
The idea is simple: there's a bit, per APIC, in guest memory,
that tells the guest that it does not need EOI.
Guest tests it using a single est and clear operation - this is
necessary so that host can detect interrupt nesting - and if set, it can
skip the EOI MSR.
A simple microbenchmark shows exit reduction:
# perf stat -e 'kvm:*' -a sleep 1s
Performance counter stats for 'sleep 1s':
47,357 kvm:kvm_entry [99.98%]
0 kvm:kvm_hypercall [99.98%]
0 kvm:kvm_hv_hypercall [99.98%]
5,001 kvm:kvm_pio [99.98%]
0 kvm:kvm_cpuid [99.98%]
22,124 kvm:kvm_apic [99.98%]
49,849 kvm:kvm_exit [99.98%]
21,115 kvm:kvm_inj_virq [99.98%]
0 kvm:kvm_inj_exception [99.98%]
0 kvm:kvm_page_fault [99.98%]
22,937 kvm:kvm_msr [99.98%]
0 kvm:kvm_cr [99.98%]
0 kvm:kvm_pic_set_irq [99.98%]
0 kvm:kvm_apic_ipi [99.98%]
22,207 kvm:kvm_apic_accept_irq [99.98%]
22,421 kvm:kvm_eoi [99.98%]
0 kvm:kvm_pv_eoi [99.99%]
0 kvm:kvm_nested_vmrun [99.99%]
0 kvm:kvm_nested_intercepts [99.99%]
0 kvm:kvm_nested_vmexit [99.99%]
0 kvm:kvm_nested_vmexit_inject [99.99%]
0 kvm:kvm_nested_intr_vmexit [99.99%]
0 kvm:kvm_invlpga [99.99%]
0 kvm:kvm_skinit [99.99%]
57 kvm:kvm_emulate_insn [99.99%]
0 kvm:vcpu_match_mmio [99.99%]
0 kvm:kvm_userspace_exit [99.99%]
2 kvm:kvm_set_irq [99.99%]
2 kvm:kvm_ioapic_set_irq [99.99%]
23,609 kvm:kvm_msi_set_irq [99.99%]
1 kvm:kvm_ack_irq [99.99%]
131 kvm:kvm_mmio [99.99%]
226 kvm:kvm_fpu [100.00%]
0 kvm:kvm_age_page [100.00%]
0 kvm:kvm_try_async_get_page [100.00%]
0 kvm:kvm_async_pf_doublefault [100.00%]
0 kvm:kvm_async_pf_not_present [100.00%]
0 kvm:kvm_async_pf_ready [100.00%]
0 kvm:kvm_async_pf_completed
1.002100578 seconds time elapsed
# perf stat -e 'kvm:*' -a sleep 1s
Performance counter stats for 'sleep 1s':
28,354 kvm:kvm_entry [99.98%]
0 kvm:kvm_hypercall [99.98%]
0 kvm:kvm_hv_hypercall [99.98%]
1,347 kvm:kvm_pio [99.98%]
0 kvm:kvm_cpuid [99.98%]
1,931 kvm:kvm_apic [99.98%]
29,595 kvm:kvm_exit [99.98%]
24,884 kvm:kvm_inj_virq [99.98%]
0 kvm:kvm_inj_exception [99.98%]
0 kvm:kvm_page_fault [99.98%]
1,986 kvm:kvm_msr [99.98%]
0 kvm:kvm_cr [99.98%]
0 kvm:kvm_pic_set_irq [99.98%]
0 kvm:kvm_apic_ipi [99.99%]
25,953 kvm:kvm_apic_accept_irq [99.99%]
26,132 kvm:kvm_eoi [99.99%]
26,593 kvm:kvm_pv_eoi [99.99%]
0 kvm:kvm_nested_vmrun [99.99%]
0 kvm:kvm_nested_intercepts [99.99%]
0 kvm:kvm_nested_vmexit [99.99%]
0 kvm:kvm_nested_vmexit_inject [99.99%]
0 kvm:kvm_nested_intr_vmexit [99.99%]
0 kvm:kvm_invlpga [99.99%]
0 kvm:kvm_skinit [99.99%]
284 kvm:kvm_emulate_insn [99.99%]
68 kvm:vcpu_match_mmio [99.99%]
68 kvm:kvm_userspace_exit [99.99%]
2 kvm:kvm_set_irq [99.99%]
2 kvm:kvm_ioapic_set_irq [99.99%]
28,288 kvm:kvm_msi_set_irq [99.99%]
1 kvm:kvm_ack_irq [99.99%]
131 kvm:kvm_mmio [100.00%]
588 kvm:kvm_fpu [100.00%]
0 kvm:kvm_age_page [100.00%]
0 kvm:kvm_try_async_get_page [100.00%]
0 kvm:kvm_async_pf_doublefault [100.00%]
0 kvm:kvm_async_pf_not_present [100.00%]
0 kvm:kvm_async_pf_ready [100.00%]
0 kvm:kvm_async_pf_completed
1.002039622 seconds time elapsed
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
arch/x86/include/asm/bitops.h | 6 +++-
arch/x86/kernel/kvm.c | 50 ++++++++++++++++++++++++++++++++++++++--
2 files changed, 51 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index b97596e..c9c70ea 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -26,11 +26,13 @@
#if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 1)
/* Technically wrong, but this avoids compilation errors on some gcc
versions. */
-#define BITOP_ADDR(x) "=m" (*(volatile long *) (x))
+#define BITOP_ADDR_CONSTRAINT "=m"
#else
-#define BITOP_ADDR(x) "+m" (*(volatile long *) (x))
+#define BITOP_ADDR_CONSTRAINT "+m"
#endif
+#define BITOP_ADDR(x) BITOP_ADDR_CONSTRAINT (*(volatile long *) (x))
+
#define ADDR BITOP_ADDR(addr)
/*
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e554e5a..31f85cb 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -39,6 +39,8 @@
#include <asm/desc.h>
#include <asm/tlbflush.h>
#include <asm/idle.h>
+#include <asm/apic.h>
+#include <asm/apicdef.h>
static int kvmapf = 1;
@@ -283,6 +285,23 @@ static void kvm_register_steal_time(void)
cpu, __pa(st));
}
+/* size alignment is implied but just to make it explicit. */
+static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) __aligned(2) = 0;
+
+static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
+{
+ /**
+ * This relies on __test_and_clear_bit to modify the memory
+ * in a way that is atomic with respect to the local CPU.
+ * The hypervisor only accesses this memory from the local CPU so
+ * there's no need for lock or memory barriers.
+ * An optimization barrier is implied in apic write.
+ */
+ if (__test_and_clear_bit(0, &__get_cpu_var(kvm_apic_eoi)))
+ return;
+ apic->write(APIC_EOI, APIC_EOI_ACK);
+}
+
void __cpuinit kvm_guest_cpu_init(void)
{
if (!kvm_para_available())
@@ -300,11 +319,17 @@ void __cpuinit kvm_guest_cpu_init(void)
smp_processor_id());
}
+ if (kvm_para_has_feature(KVM_FEATURE_EOI)) {
+ __get_cpu_var(kvm_apic_eoi) = 0;
+ wrmsrl(MSR_KVM_EOI_EN, __pa(&__get_cpu_var(kvm_apic_eoi)) |
+ KVM_MSR_ENABLED);
+ }
+
if (has_steal_clock)
kvm_register_steal_time();
}
-static void kvm_pv_disable_apf(void *unused)
+static void kvm_pv_disable_apf(void)
{
if (!__get_cpu_var(apf_reason).enabled)
return;
@@ -316,11 +341,18 @@ static void kvm_pv_disable_apf(void *unused)
smp_processor_id());
}
+static void kvm_pv_guest_cpu_reboot(void *unused)
+{
+ if (kvm_para_has_feature(KVM_FEATURE_EOI))
+ wrmsrl(MSR_KVM_EOI_EN, 0);
+ kvm_pv_disable_apf();
+}
+
static int kvm_pv_reboot_notify(struct notifier_block *nb,
unsigned long code, void *unused)
{
if (code == SYS_RESTART)
- on_each_cpu(kvm_pv_disable_apf, NULL, 1);
+ on_each_cpu(kvm_pv_guest_cpu_reboot, NULL, 1);
return NOTIFY_DONE;
}
@@ -371,7 +403,9 @@ static void __cpuinit kvm_guest_cpu_online(void *dummy)
static void kvm_guest_cpu_offline(void *dummy)
{
kvm_disable_steal_time();
- kvm_pv_disable_apf(NULL);
+ if (kvm_para_has_feature(KVM_FEATURE_EOI))
+ wrmsrl(MSR_KVM_EOI_EN, 0);
+ kvm_pv_disable_apf();
apf_task_wake_all();
}
@@ -424,6 +458,16 @@ void __init kvm_guest_init(void)
pv_time_ops.steal_clock = kvm_steal_clock;
}
+ if (kvm_para_has_feature(KVM_FEATURE_EOI)) {
+ struct apic **drv;
+
+ for (drv = __apicdrivers; drv < __apicdrivers_end; drv++) {
+ /* Should happen once for each apic */
+ WARN_ON((*drv)->eoi_write == kvm_guest_apic_eoi_write);
+ (*drv)->eoi_write = kvm_guest_apic_eoi_write;
+ }
+ }
+
#ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
register_cpu_notifier(&kvm_cpu_notifier);
--
MST
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCHv2 6/7] x86/bitops: note on __test_and_clear_bit atomicity
[not found] <cover.1336679924.git.mst@redhat.com>
` (4 preceding siblings ...)
2012-05-11 7:38 ` [PATCHv2 5/7] kvm_para: guest side for eoi avoidance Michael S. Tsirkin
@ 2012-05-11 7:38 ` Michael S. Tsirkin
2012-05-11 7:38 ` [PATCHv2 7/7] kvm: host side for eoi optimization Michael S. Tsirkin
2012-05-11 7:43 ` [PATCH] qemu: whitelist kvm pv eoi feature Michael S. Tsirkin
7 siblings, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2012-05-11 7:38 UTC (permalink / raw)
To: x86, kvm
Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
Linus Torvalds, linux-kernel
__test_and_clear_bit is actually atomic with respect
to the local CPU. Add a note saying that KVM on x86
relies on this behaviour so people don't accidentaly break it.
Also warn not to rely on this in portable code.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
arch/x86/include/asm/bitops.h | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index c9c70ea..86f3a1e 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -264,6 +264,13 @@ static inline int test_and_clear_bit(int nr, volatile unsigned long *addr)
* This operation is non-atomic and can be reordered.
* If two examples of this operation race, one can appear to succeed
* but actually fail. You must protect multiple accesses with a lock.
+ *
+ * Note: the operation is performed atomically with respect to
+ * the local CPU, but not other CPUs. Portable code should not
+ * rely on this behaviour.
+ * KVM relies on this behaviour on x86 for modifying memory that is also
+ * accessed from a hypervisor on the same CPU if running in a VM: don't change
+ * this without also updating arch/x86/kernel/kvm.c
*/
static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
{
--
MST
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCHv2 7/7] kvm: host side for eoi optimization
[not found] <cover.1336679924.git.mst@redhat.com>
` (5 preceding siblings ...)
2012-05-11 7:38 ` [PATCHv2 6/7] x86/bitops: note on __test_and_clear_bit atomicity Michael S. Tsirkin
@ 2012-05-11 7:38 ` Michael S. Tsirkin
2012-05-13 9:33 ` Gleb Natapov
2012-05-11 7:43 ` [PATCH] qemu: whitelist kvm pv eoi feature Michael S. Tsirkin
7 siblings, 1 reply; 12+ messages in thread
From: Michael S. Tsirkin @ 2012-05-11 7:38 UTC (permalink / raw)
To: x86, kvm
Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
Linus Torvalds, linux-kernel
Implementation of PV EOI using shared memory.
This reduces the number of exits an interrupt
causes as much as by half.
The idea is simple: there's a bit, per APIC, in guest memory,
that tells the guest that it does not need EOI.
We set it before injecting an interrupt and clear
before injecting a nested one. Guest tests it using
a test and clear operation - this is necessary
so that host can detect interrupt nesting -
and if set, it can skip the EOI MSR.
There's a new MSR to set the address of said register
in guest memory. Otherwise not much changed:
- Guest EOI is not required
- Register is tested & ISR is automatically cleared on exit
This patch is on top of kvm.git.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 6 ++
arch/x86/include/asm/kvm_para.h | 2 +
arch/x86/kvm/cpuid.c | 1 +
arch/x86/kvm/irq.c | 2 +-
arch/x86/kvm/lapic.c | 109 +++++++++++++++++++++++++++++++++++++--
arch/x86/kvm/lapic.h | 2 +
arch/x86/kvm/trace.h | 34 ++++++++++++
arch/x86/kvm/x86.c | 8 +++-
8 files changed, 157 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 32297f2..7673f4d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -174,6 +174,7 @@ enum {
/* apic attention bits */
#define KVM_APIC_CHECK_VAPIC 0
+#define KVM_APIC_EOI_PENDING 1
/*
* We don't want allocation failures within the mmu code, so we preallocate
@@ -485,6 +486,11 @@ struct kvm_vcpu_arch {
u64 length;
u64 status;
} osvw;
+
+ struct {
+ u64 msr_val;
+ struct gfn_to_hva_cache data;
+ } eoi;
};
struct kvm_lpage_info {
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 734c376..195840d 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -22,6 +22,7 @@
#define KVM_FEATURE_CLOCKSOURCE2 3
#define KVM_FEATURE_ASYNC_PF 4
#define KVM_FEATURE_STEAL_TIME 5
+#define KVM_FEATURE_EOI 6
/* The last 8 bits are used to indicate how to interpret the flags field
* in pvclock structure. If no bits are set, all flags are ignored.
@@ -37,6 +38,7 @@
#define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
#define MSR_KVM_ASYNC_PF_EN 0x4b564d02
#define MSR_KVM_STEAL_TIME 0x4b564d03
+#define MSR_KVM_EOI_EN 0x4b564d04
struct kvm_steal_time {
__u64 steal;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index eba8345..bda4877 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -408,6 +408,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
(1 << KVM_FEATURE_NOP_IO_DELAY) |
(1 << KVM_FEATURE_CLOCKSOURCE2) |
(1 << KVM_FEATURE_ASYNC_PF) |
+ (1 << KVM_FEATURE_EOI) |
(1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
if (sched_info_on())
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 7e06ba1..07bdfab 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -48,7 +48,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
if (!irqchip_in_kernel(v->kvm))
return v->arch.interrupt.pending;
- if (kvm_apic_has_interrupt(v) == -1) { /* LAPIC */
+ if (kvm_apic_has_interrupt(v) < 0) { /* LAPIC */
if (kvm_apic_accept_pic_intr(v)) {
s = pic_irqchip(v->kvm); /* PIC */
return s->output;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 93c1574..77e0244 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -270,6 +270,54 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
irq->level, irq->trig_mode);
}
+static int eoi_put_user(struct kvm_vcpu *vcpu, u8 val)
+{
+
+ return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.eoi.data, &val,
+ sizeof(val));
+}
+
+static int eoi_get_user(struct kvm_vcpu *vcpu, u8 *val)
+{
+
+ return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.eoi.data, val,
+ sizeof(*val));
+}
+
+static inline bool eoi_enabled(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.eoi.msr_val & KVM_MSR_ENABLED;
+}
+
+static bool eoi_get_pending(struct kvm_vcpu *vcpu)
+{
+ u8 val;
+ if (eoi_get_user(vcpu, &val) < 0)
+ apic_debug("Can't read EOI MSR value: 0x%llx\n",
+ (unsigned long long)vcpi->arch.eoi.msr_val);
+ return val & 0x1;
+}
+
+static void eoi_set_pending(struct kvm_vcpu *vcpu)
+{
+ if (eoi_put_user(vcpu, 0x1) < 0) {
+ apic_debug("Can't set EOI MSR value: 0x%llx\n",
+ (unsigned long long)vcpi->arch.eoi.msr_val);
+ return;
+ }
+ __set_bit(KVM_APIC_EOI_PENDING, &vcpu->arch.apic_attention);
+}
+
+static void eoi_clr_pending(struct kvm_vcpu *vcpu)
+{
+ if (eoi_put_user(vcpu, 0x0) < 0) {
+ apic_debug("Can't clear EOI MSR value: 0x%llx\n",
+ (unsigned long long)vcpi->arch.eoi.msr_val);
+ return;
+ }
+ __clear_bit(KVM_APIC_EOI_PENDING, &vcpu->arch.apic_attention);
+}
+
static inline int apic_find_highest_isr(struct kvm_lapic *apic)
{
int result;
@@ -482,16 +530,21 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
}
-static void apic_set_eoi(struct kvm_lapic *apic)
+static int apic_set_eoi(struct kvm_lapic *apic)
{
int vector = apic_find_highest_isr(apic);
+
+ trace_kvm_eoi(apic, vector);
+
/*
* Not every write EOI will has corresponding ISR,
* one example is when Kernel check timer on setup_IO_APIC
*/
if (vector == -1)
- return;
+ return vector;
+ if (eoi_enabled(apic->vcpu))
+ eoi_clr_pending(apic->vcpu);
apic_clear_vector(vector, apic->regs + APIC_ISR);
apic_update_ppr(apic);
@@ -505,6 +558,7 @@ static void apic_set_eoi(struct kvm_lapic *apic)
kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
}
kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
+ return vector;
}
static void apic_send_ipi(struct kvm_lapic *apic)
@@ -1085,6 +1139,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
atomic_set(&apic->lapic_timer.pending, 0);
if (kvm_vcpu_is_bsp(vcpu))
vcpu->arch.apic_base |= MSR_IA32_APICBASE_BSP;
+ vcpu->arch.eoi.msr_val = 0;
apic_update_ppr(apic);
vcpu->arch.apic_arb_prio = 0;
@@ -1211,9 +1266,10 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
apic_update_ppr(apic);
highest_irr = apic_find_highest_irr(apic);
- if ((highest_irr == -1) ||
- ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
+ if (highest_irr == -1)
return -1;
+ if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
+ return -2;
return highest_irr;
}
@@ -1245,9 +1301,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
int vector = kvm_apic_has_interrupt(vcpu);
struct kvm_lapic *apic = vcpu->arch.apic;
- if (vector == -1)
+ /* Detect interrupt nesting and disable EOI optimization */
+ if (eoi_enabled(vcpu) && vector == -2)
+ eoi_clr_pending(vcpu);
+
+ if (vector < 0)
return -1;
+ if (eoi_enabled(vcpu)) {
+ if (kvm_ioapic_handles_vector(vcpu->kvm, vector))
+ eoi_clr_pending(vcpu);
+ else
+ eoi_set_pending(vcpu);
+ }
+
apic_set_vector(vector, apic->regs + APIC_ISR);
apic_update_ppr(apic);
apic_clear_irr(vector, apic);
@@ -1262,6 +1329,14 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu)
MSR_IA32_APICBASE_BASE;
kvm_apic_set_version(vcpu);
+ if (eoi_enabled(apic->vcpu)) {
+ if (eoi_get_pending(apic->vcpu))
+ __set_bit(KVM_APIC_EOI_PENDING,
+ &vcpu->arch.apic_attention);
+ else
+ __clear_bit(KVM_APIC_EOI_PENDING,
+ &vcpu->arch.apic_attention);
+ }
apic_update_ppr(apic);
hrtimer_cancel(&apic->lapic_timer.timer);
update_divide_count(apic);
@@ -1283,11 +1358,25 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
}
+static void apic_update_eoi(struct kvm_lapic *apic)
+{
+ int vector;
+ BUG_ON(!eoi_enabled(apic->vcpu));
+ if (eoi_get_pending(apic->vcpu))
+ return;
+ __clear_bit(KVM_APIC_EOI_PENDING, &apic->vcpu->arch.apic_attention);
+ vector = apic_set_eoi(apic);
+ trace_kvm_pv_eoi(apic, vector);
+}
+
void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
{
u32 data;
void *vapic;
+ if (test_bit(KVM_APIC_EOI_PENDING, &vcpu->arch.apic_attention))
+ apic_update_eoi(vcpu->arch.apic);
+
if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
return;
@@ -1394,3 +1483,13 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
return 0;
}
+
+int kvm_pv_enable_apic_eoi(struct kvm_vcpu *vcpu, u64 data)
+{
+ u64 addr = data & ~KVM_MSR_ENABLED;
+ if (eoi_enabled(vcpu))
+ eoi_clr_pending(vcpu);
+ vcpu->arch.eoi.msr_val = data;
+ kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.eoi.data, addr);
+ return 0;
+}
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 6f4ce25..042dace 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -60,4 +60,6 @@ static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
{
return vcpu->arch.hv_vapic & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE;
}
+
+int kvm_pv_enable_apic_eoi(struct kvm_vcpu *vcpu, u64 data);
#endif
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 911d264..851914e 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -517,6 +517,40 @@ TRACE_EVENT(kvm_apic_accept_irq,
__entry->coalesced ? " (coalesced)" : "")
);
+TRACE_EVENT(kvm_eoi,
+ TP_PROTO(struct kvm_lapic *apic, int vector),
+ TP_ARGS(apic, vector),
+
+ TP_STRUCT__entry(
+ __field( __u32, apicid )
+ __field( int, vector )
+ ),
+
+ TP_fast_assign(
+ __entry->apicid = apic->vcpu->vcpu_id;
+ __entry->vector = vector;
+ ),
+
+ TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
+);
+
+TRACE_EVENT(kvm_pv_eoi,
+ TP_PROTO(struct kvm_lapic *apic, int vector),
+ TP_ARGS(apic, vector),
+
+ TP_STRUCT__entry(
+ __field( __u32, apicid )
+ __field( int, vector )
+ ),
+
+ TP_fast_assign(
+ __entry->apicid = apic->vcpu->vcpu_id;
+ __entry->vector = vector;
+ ),
+
+ TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
+);
+
/*
* Tracepoint for nested VMRUN
*/
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 185a2b8..bfcd97d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -795,6 +795,7 @@ static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
+ MSR_KVM_EOI_EN,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
MSR_STAR,
#ifdef CONFIG_X86_64
@@ -1653,6 +1654,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
break;
+ case MSR_KVM_EOI_EN:
+ if (kvm_pv_enable_apic_eoi(vcpu, data))
+ return 1;
+ break;
case MSR_IA32_MCG_CTL:
case MSR_IA32_MCG_STATUS:
@@ -5370,7 +5375,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (unlikely(vcpu->arch.tsc_always_catchup))
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
- kvm_lapic_sync_from_vapic(vcpu);
+ if (unlikely(vcpu->arch.apic_attention))
+ kvm_lapic_sync_from_vapic(vcpu);
r = kvm_x86_ops->handle_exit(vcpu);
out:
--
MST
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH] qemu: whitelist kvm pv eoi feature
[not found] <cover.1336679924.git.mst@redhat.com>
` (6 preceding siblings ...)
2012-05-11 7:38 ` [PATCHv2 7/7] kvm: host side for eoi optimization Michael S. Tsirkin
@ 2012-05-11 7:43 ` Michael S. Tsirkin
7 siblings, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2012-05-11 7:43 UTC (permalink / raw)
To: x86, kvm
Cc: Ingo Molnar, H. Peter Anvin, Avi Kivity, Marcelo Tosatti, gleb,
Linus Torvalds, linux-kernel, qemu-devel, Anthony Liguori,
Andreas Färber, Markus Armbruster, Stefan Hajnoczi
Whitelist kvm pv eoi feature. The feature is enabled
with -cpu kvm64. To disable: -cpu kvm64,-kvm_eoi.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
Sending a copy to kernel list as this is needed
to test the pv eoi feature recently submitted.
target-i386/cpuid.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index 465ea15..c421b19 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -75,7 +75,7 @@ static const char *ext3_feature_name[] = {
};
static const char *kvm_feature_name[] = {
- "kvmclock", "kvm_nopiodelay", "kvm_mmu", "kvmclock", "kvm_asyncpf", NULL, NULL, NULL,
+ "kvmclock", "kvm_nopiodelay", "kvm_mmu", "kvmclock", "kvm_asyncpf", NULL, "kvm_eoi", NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
--
MST
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCHv2 7/7] kvm: host side for eoi optimization
2012-05-11 7:38 ` [PATCHv2 7/7] kvm: host side for eoi optimization Michael S. Tsirkin
@ 2012-05-13 9:33 ` Gleb Natapov
2012-05-13 10:01 ` Michael S. Tsirkin
2012-05-13 15:15 ` Michael S. Tsirkin
0 siblings, 2 replies; 12+ messages in thread
From: Gleb Natapov @ 2012-05-13 9:33 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity,
Marcelo Tosatti, Linus Torvalds, linux-kernel
On Fri, May 11, 2012 at 10:38:40AM +0300, Michael S. Tsirkin wrote:
> Implementation of PV EOI using shared memory.
> This reduces the number of exits an interrupt
> causes as much as by half.
>
> The idea is simple: there's a bit, per APIC, in guest memory,
> that tells the guest that it does not need EOI.
> We set it before injecting an interrupt and clear
> before injecting a nested one. Guest tests it using
> a test and clear operation - this is necessary
> so that host can detect interrupt nesting -
> and if set, it can skip the EOI MSR.
>
> There's a new MSR to set the address of said register
> in guest memory. Otherwise not much changed:
> - Guest EOI is not required
> - Register is tested & ISR is automatically cleared on exit
>
> This patch is on top of kvm.git.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> arch/x86/include/asm/kvm_host.h | 6 ++
> arch/x86/include/asm/kvm_para.h | 2 +
> arch/x86/kvm/cpuid.c | 1 +
> arch/x86/kvm/irq.c | 2 +-
> arch/x86/kvm/lapic.c | 109 +++++++++++++++++++++++++++++++++++++--
> arch/x86/kvm/lapic.h | 2 +
> arch/x86/kvm/trace.h | 34 ++++++++++++
> arch/x86/kvm/x86.c | 8 +++-
Documentation ++++++ :)
> 8 files changed, 157 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 32297f2..7673f4d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -174,6 +174,7 @@ enum {
>
> /* apic attention bits */
> #define KVM_APIC_CHECK_VAPIC 0
> +#define KVM_APIC_EOI_PENDING 1
>
> /*
> * We don't want allocation failures within the mmu code, so we preallocate
> @@ -485,6 +486,11 @@ struct kvm_vcpu_arch {
> u64 length;
> u64 status;
> } osvw;
> +
> + struct {
> + u64 msr_val;
> + struct gfn_to_hva_cache data;
> + } eoi;
> };
>
> struct kvm_lpage_info {
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 734c376..195840d 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -22,6 +22,7 @@
> #define KVM_FEATURE_CLOCKSOURCE2 3
> #define KVM_FEATURE_ASYNC_PF 4
> #define KVM_FEATURE_STEAL_TIME 5
> +#define KVM_FEATURE_EOI 6
>
> /* The last 8 bits are used to indicate how to interpret the flags field
> * in pvclock structure. If no bits are set, all flags are ignored.
> @@ -37,6 +38,7 @@
> #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
> #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
> #define MSR_KVM_STEAL_TIME 0x4b564d03
> +#define MSR_KVM_EOI_EN 0x4b564d04
>
> struct kvm_steal_time {
> __u64 steal;
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index eba8345..bda4877 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -408,6 +408,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> (1 << KVM_FEATURE_NOP_IO_DELAY) |
> (1 << KVM_FEATURE_CLOCKSOURCE2) |
> (1 << KVM_FEATURE_ASYNC_PF) |
> + (1 << KVM_FEATURE_EOI) |
> (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
>
> if (sched_info_on())
> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> index 7e06ba1..07bdfab 100644
> --- a/arch/x86/kvm/irq.c
> +++ b/arch/x86/kvm/irq.c
> @@ -48,7 +48,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
> if (!irqchip_in_kernel(v->kvm))
> return v->arch.interrupt.pending;
>
> - if (kvm_apic_has_interrupt(v) == -1) { /* LAPIC */
> + if (kvm_apic_has_interrupt(v) < 0) { /* LAPIC */
> if (kvm_apic_accept_pic_intr(v)) {
> s = pic_irqchip(v->kvm); /* PIC */
> return s->output;
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 93c1574..77e0244 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -270,6 +270,54 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
> irq->level, irq->trig_mode);
> }
>
> +static int eoi_put_user(struct kvm_vcpu *vcpu, u8 val)
> +{
> +
> + return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.eoi.data, &val,
> + sizeof(val));
> +}
> +
> +static int eoi_get_user(struct kvm_vcpu *vcpu, u8 *val)
> +{
> +
> + return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.eoi.data, val,
> + sizeof(*val));
> +}
> +
> +static inline bool eoi_enabled(struct kvm_vcpu *vcpu)
> +{
> + return vcpu->arch.eoi.msr_val & KVM_MSR_ENABLED;
> +}
> +
> +static bool eoi_get_pending(struct kvm_vcpu *vcpu)
> +{
> + u8 val;
> + if (eoi_get_user(vcpu, &val) < 0)
> + apic_debug("Can't read EOI MSR value: 0x%llx\n",
> + (unsigned long long)vcpi->arch.eoi.msr_val);
> + return val & 0x1;
> +}
> +
> +static void eoi_set_pending(struct kvm_vcpu *vcpu)
> +{
> + if (eoi_put_user(vcpu, 0x1) < 0) {
> + apic_debug("Can't set EOI MSR value: 0x%llx\n",
> + (unsigned long long)vcpi->arch.eoi.msr_val);
> + return;
> + }
> + __set_bit(KVM_APIC_EOI_PENDING, &vcpu->arch.apic_attention);
> +}
> +
> +static void eoi_clr_pending(struct kvm_vcpu *vcpu)
> +{
> + if (eoi_put_user(vcpu, 0x0) < 0) {
> + apic_debug("Can't clear EOI MSR value: 0x%llx\n",
> + (unsigned long long)vcpi->arch.eoi.msr_val);
> + return;
> + }
> + __clear_bit(KVM_APIC_EOI_PENDING, &vcpu->arch.apic_attention);
> +}
> +
> static inline int apic_find_highest_isr(struct kvm_lapic *apic)
> {
> int result;
> @@ -482,16 +530,21 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
> return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
> }
>
> -static void apic_set_eoi(struct kvm_lapic *apic)
> +static int apic_set_eoi(struct kvm_lapic *apic)
> {
> int vector = apic_find_highest_isr(apic);
> +
> + trace_kvm_eoi(apic, vector);
> +
> /*
> * Not every write EOI will has corresponding ISR,
> * one example is when Kernel check timer on setup_IO_APIC
> */
> if (vector == -1)
> - return;
> + return vector;
>
> + if (eoi_enabled(apic->vcpu))
> + eoi_clr_pending(apic->vcpu);
> apic_clear_vector(vector, apic->regs + APIC_ISR);
> apic_update_ppr(apic);
>
> @@ -505,6 +558,7 @@ static void apic_set_eoi(struct kvm_lapic *apic)
> kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> }
> kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> + return vector;
> }
>
> static void apic_send_ipi(struct kvm_lapic *apic)
> @@ -1085,6 +1139,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
> atomic_set(&apic->lapic_timer.pending, 0);
> if (kvm_vcpu_is_bsp(vcpu))
> vcpu->arch.apic_base |= MSR_IA32_APICBASE_BSP;
> + vcpu->arch.eoi.msr_val = 0;
> apic_update_ppr(apic);
>
> vcpu->arch.apic_arb_prio = 0;
> @@ -1211,9 +1266,10 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
>
> apic_update_ppr(apic);
> highest_irr = apic_find_highest_irr(apic);
> - if ((highest_irr == -1) ||
> - ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> + if (highest_irr == -1)
> return -1;
> + if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> + return -2;
> return highest_irr;
> }
>
> @@ -1245,9 +1301,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> int vector = kvm_apic_has_interrupt(vcpu);
> struct kvm_lapic *apic = vcpu->arch.apic;
>
> - if (vector == -1)
> + /* Detect interrupt nesting and disable EOI optimization */
> + if (eoi_enabled(vcpu) && vector == -2)
> + eoi_clr_pending(vcpu);
> +
> + if (vector < 0)
> return -1;
>
> + if (eoi_enabled(vcpu)) {
> + if (kvm_ioapic_handles_vector(vcpu->kvm, vector))
> + eoi_clr_pending(vcpu);
> + else
> + eoi_set_pending(vcpu);
> + }
> +
> apic_set_vector(vector, apic->regs + APIC_ISR);
> apic_update_ppr(apic);
> apic_clear_irr(vector, apic);
> @@ -1262,6 +1329,14 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu)
> MSR_IA32_APICBASE_BASE;
> kvm_apic_set_version(vcpu);
>
> + if (eoi_enabled(apic->vcpu)) {
> + if (eoi_get_pending(apic->vcpu))
> + __set_bit(KVM_APIC_EOI_PENDING,
> + &vcpu->arch.apic_attention);
> + else
> + __clear_bit(KVM_APIC_EOI_PENDING,
> + &vcpu->arch.apic_attention);
> + }
> apic_update_ppr(apic);
> hrtimer_cancel(&apic->lapic_timer.timer);
> update_divide_count(apic);
> @@ -1283,11 +1358,25 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
> hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
> }
>
> +static void apic_update_eoi(struct kvm_lapic *apic)
> +{
> + int vector;
> + BUG_ON(!eoi_enabled(apic->vcpu));
> + if (eoi_get_pending(apic->vcpu))
> + return;
> + __clear_bit(KVM_APIC_EOI_PENDING, &apic->vcpu->arch.apic_attention);
> + vector = apic_set_eoi(apic);
> + trace_kvm_pv_eoi(apic, vector);
> +}
> +
> void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
> {
> u32 data;
> void *vapic;
>
> + if (test_bit(KVM_APIC_EOI_PENDING, &vcpu->arch.apic_attention))
> + apic_update_eoi(vcpu->arch.apic);
> +
> if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
> return;
>
> @@ -1394,3 +1483,13 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
>
> return 0;
> }
> +
> +int kvm_pv_enable_apic_eoi(struct kvm_vcpu *vcpu, u64 data)
> +{
> + u64 addr = data & ~KVM_MSR_ENABLED;
Since documentation is missing I do not know what alignment addr should
have, but if it is more than 2 bytes we should check if reserved bits
are not zero and #GP if they are.
> + if (eoi_enabled(vcpu))
> + eoi_clr_pending(vcpu);
> + vcpu->arch.eoi.msr_val = data;
> + kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.eoi.data, addr);
This may fail.
> + return 0;
> +}
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 6f4ce25..042dace 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -60,4 +60,6 @@ static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
> {
> return vcpu->arch.hv_vapic & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE;
> }
> +
> +int kvm_pv_enable_apic_eoi(struct kvm_vcpu *vcpu, u64 data);
> #endif
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index 911d264..851914e 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -517,6 +517,40 @@ TRACE_EVENT(kvm_apic_accept_irq,
> __entry->coalesced ? " (coalesced)" : "")
> );
>
> +TRACE_EVENT(kvm_eoi,
> + TP_PROTO(struct kvm_lapic *apic, int vector),
> + TP_ARGS(apic, vector),
> +
> + TP_STRUCT__entry(
> + __field( __u32, apicid )
> + __field( int, vector )
> + ),
> +
> + TP_fast_assign(
> + __entry->apicid = apic->vcpu->vcpu_id;
> + __entry->vector = vector;
> + ),
> +
> + TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
> +);
> +
> +TRACE_EVENT(kvm_pv_eoi,
> + TP_PROTO(struct kvm_lapic *apic, int vector),
> + TP_ARGS(apic, vector),
> +
> + TP_STRUCT__entry(
> + __field( __u32, apicid )
> + __field( int, vector )
> + ),
> +
> + TP_fast_assign(
> + __entry->apicid = apic->vcpu->vcpu_id;
> + __entry->vector = vector;
> + ),
> +
> + TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
> +);
> +
> /*
> * Tracepoint for nested VMRUN
> */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 185a2b8..bfcd97d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -795,6 +795,7 @@ static u32 msrs_to_save[] = {
> MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
> HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
> HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
> + MSR_KVM_EOI_EN,
> MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
> MSR_STAR,
> #ifdef CONFIG_X86_64
> @@ -1653,6 +1654,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
>
> break;
> + case MSR_KVM_EOI_EN:
> + if (kvm_pv_enable_apic_eoi(vcpu, data))
> + return 1;
> + break;
>
> case MSR_IA32_MCG_CTL:
> case MSR_IA32_MCG_STATUS:
> @@ -5370,7 +5375,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> if (unlikely(vcpu->arch.tsc_always_catchup))
> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>
> - kvm_lapic_sync_from_vapic(vcpu);
> + if (unlikely(vcpu->arch.apic_attention))
> + kvm_lapic_sync_from_vapic(vcpu);
>
> r = kvm_x86_ops->handle_exit(vcpu);
> out:
> --
> MST
--
Gleb.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCHv2 7/7] kvm: host side for eoi optimization
2012-05-13 9:33 ` Gleb Natapov
@ 2012-05-13 10:01 ` Michael S. Tsirkin
2012-05-13 10:04 ` Gleb Natapov
2012-05-13 15:15 ` Michael S. Tsirkin
1 sibling, 1 reply; 12+ messages in thread
From: Michael S. Tsirkin @ 2012-05-13 10:01 UTC (permalink / raw)
To: Gleb Natapov
Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity,
Marcelo Tosatti, Linus Torvalds, linux-kernel
On Sun, May 13, 2012 at 12:33:17PM +0300, Gleb Natapov wrote:
> On Fri, May 11, 2012 at 10:38:40AM +0300, Michael S. Tsirkin wrote:
> > Implementation of PV EOI using shared memory.
> > This reduces the number of exits an interrupt
> > causes as much as by half.
> >
> > The idea is simple: there's a bit, per APIC, in guest memory,
> > that tells the guest that it does not need EOI.
> > We set it before injecting an interrupt and clear
> > before injecting a nested one. Guest tests it using
> > a test and clear operation - this is necessary
> > so that host can detect interrupt nesting -
> > and if set, it can skip the EOI MSR.
> >
> > There's a new MSR to set the address of said register
> > in guest memory. Otherwise not much changed:
> > - Guest EOI is not required
> > - Register is tested & ISR is automatically cleared on exit
> >
> > This patch is on top of kvm.git.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > arch/x86/include/asm/kvm_host.h | 6 ++
> > arch/x86/include/asm/kvm_para.h | 2 +
> > arch/x86/kvm/cpuid.c | 1 +
> > arch/x86/kvm/irq.c | 2 +-
> > arch/x86/kvm/lapic.c | 109 +++++++++++++++++++++++++++++++++++++--
> > arch/x86/kvm/lapic.h | 2 +
> > arch/x86/kvm/trace.h | 34 ++++++++++++
> > arch/x86/kvm/x86.c | 8 +++-
> Documentation ++++++ :)
Right. Will do. Documentation/virtual/kvm/msr.txt
is probably enough?
> > 8 files changed, 157 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 32297f2..7673f4d 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -174,6 +174,7 @@ enum {
> >
> > /* apic attention bits */
> > #define KVM_APIC_CHECK_VAPIC 0
> > +#define KVM_APIC_EOI_PENDING 1
> >
> > /*
> > * We don't want allocation failures within the mmu code, so we preallocate
> > @@ -485,6 +486,11 @@ struct kvm_vcpu_arch {
> > u64 length;
> > u64 status;
> > } osvw;
> > +
> > + struct {
> > + u64 msr_val;
> > + struct gfn_to_hva_cache data;
> > + } eoi;
> > };
> >
> > struct kvm_lpage_info {
> > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> > index 734c376..195840d 100644
> > --- a/arch/x86/include/asm/kvm_para.h
> > +++ b/arch/x86/include/asm/kvm_para.h
> > @@ -22,6 +22,7 @@
> > #define KVM_FEATURE_CLOCKSOURCE2 3
> > #define KVM_FEATURE_ASYNC_PF 4
> > #define KVM_FEATURE_STEAL_TIME 5
> > +#define KVM_FEATURE_EOI 6
> >
> > /* The last 8 bits are used to indicate how to interpret the flags field
> > * in pvclock structure. If no bits are set, all flags are ignored.
> > @@ -37,6 +38,7 @@
> > #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
> > #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
> > #define MSR_KVM_STEAL_TIME 0x4b564d03
> > +#define MSR_KVM_EOI_EN 0x4b564d04
> >
> > struct kvm_steal_time {
> > __u64 steal;
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index eba8345..bda4877 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -408,6 +408,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > (1 << KVM_FEATURE_NOP_IO_DELAY) |
> > (1 << KVM_FEATURE_CLOCKSOURCE2) |
> > (1 << KVM_FEATURE_ASYNC_PF) |
> > + (1 << KVM_FEATURE_EOI) |
> > (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
> >
> > if (sched_info_on())
> > diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> > index 7e06ba1..07bdfab 100644
> > --- a/arch/x86/kvm/irq.c
> > +++ b/arch/x86/kvm/irq.c
> > @@ -48,7 +48,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
> > if (!irqchip_in_kernel(v->kvm))
> > return v->arch.interrupt.pending;
> >
> > - if (kvm_apic_has_interrupt(v) == -1) { /* LAPIC */
> > + if (kvm_apic_has_interrupt(v) < 0) { /* LAPIC */
> > if (kvm_apic_accept_pic_intr(v)) {
> > s = pic_irqchip(v->kvm); /* PIC */
> > return s->output;
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 93c1574..77e0244 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -270,6 +270,54 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
> > irq->level, irq->trig_mode);
> > }
> >
> > +static int eoi_put_user(struct kvm_vcpu *vcpu, u8 val)
> > +{
> > +
> > + return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.eoi.data, &val,
> > + sizeof(val));
> > +}
> > +
> > +static int eoi_get_user(struct kvm_vcpu *vcpu, u8 *val)
> > +{
> > +
> > + return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.eoi.data, val,
> > + sizeof(*val));
> > +}
> > +
> > +static inline bool eoi_enabled(struct kvm_vcpu *vcpu)
> > +{
> > + return vcpu->arch.eoi.msr_val & KVM_MSR_ENABLED;
> > +}
> > +
> > +static bool eoi_get_pending(struct kvm_vcpu *vcpu)
> > +{
> > + u8 val;
> > + if (eoi_get_user(vcpu, &val) < 0)
> > + apic_debug("Can't read EOI MSR value: 0x%llx\n",
> > + (unsigned long long)vcpi->arch.eoi.msr_val);
> > + return val & 0x1;
> > +}
> > +
> > +static void eoi_set_pending(struct kvm_vcpu *vcpu)
> > +{
> > + if (eoi_put_user(vcpu, 0x1) < 0) {
> > + apic_debug("Can't set EOI MSR value: 0x%llx\n",
> > + (unsigned long long)vcpi->arch.eoi.msr_val);
> > + return;
> > + }
> > + __set_bit(KVM_APIC_EOI_PENDING, &vcpu->arch.apic_attention);
> > +}
> > +
> > +static void eoi_clr_pending(struct kvm_vcpu *vcpu)
> > +{
> > + if (eoi_put_user(vcpu, 0x0) < 0) {
> > + apic_debug("Can't clear EOI MSR value: 0x%llx\n",
> > + (unsigned long long)vcpi->arch.eoi.msr_val);
> > + return;
> > + }
> > + __clear_bit(KVM_APIC_EOI_PENDING, &vcpu->arch.apic_attention);
> > +}
> > +
> > static inline int apic_find_highest_isr(struct kvm_lapic *apic)
> > {
> > int result;
> > @@ -482,16 +530,21 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
> > return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
> > }
> >
> > -static void apic_set_eoi(struct kvm_lapic *apic)
> > +static int apic_set_eoi(struct kvm_lapic *apic)
> > {
> > int vector = apic_find_highest_isr(apic);
> > +
> > + trace_kvm_eoi(apic, vector);
> > +
> > /*
> > * Not every write EOI will has corresponding ISR,
> > * one example is when Kernel check timer on setup_IO_APIC
> > */
> > if (vector == -1)
> > - return;
> > + return vector;
> >
> > + if (eoi_enabled(apic->vcpu))
> > + eoi_clr_pending(apic->vcpu);
> > apic_clear_vector(vector, apic->regs + APIC_ISR);
> > apic_update_ppr(apic);
> >
> > @@ -505,6 +558,7 @@ static void apic_set_eoi(struct kvm_lapic *apic)
> > kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
> > }
> > kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
> > + return vector;
> > }
> >
> > static void apic_send_ipi(struct kvm_lapic *apic)
> > @@ -1085,6 +1139,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
> > atomic_set(&apic->lapic_timer.pending, 0);
> > if (kvm_vcpu_is_bsp(vcpu))
> > vcpu->arch.apic_base |= MSR_IA32_APICBASE_BSP;
> > + vcpu->arch.eoi.msr_val = 0;
> > apic_update_ppr(apic);
> >
> > vcpu->arch.apic_arb_prio = 0;
> > @@ -1211,9 +1266,10 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
> >
> > apic_update_ppr(apic);
> > highest_irr = apic_find_highest_irr(apic);
> > - if ((highest_irr == -1) ||
> > - ((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> > + if (highest_irr == -1)
> > return -1;
> > + if (((highest_irr & 0xF0) <= apic_get_reg(apic, APIC_PROCPRI)))
> > + return -2;
> > return highest_irr;
> > }
> >
> > @@ -1245,9 +1301,20 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> > int vector = kvm_apic_has_interrupt(vcpu);
> > struct kvm_lapic *apic = vcpu->arch.apic;
> >
> > - if (vector == -1)
> > + /* Detect interrupt nesting and disable EOI optimization */
> > + if (eoi_enabled(vcpu) && vector == -2)
> > + eoi_clr_pending(vcpu);
> > +
> > + if (vector < 0)
> > return -1;
> >
> > + if (eoi_enabled(vcpu)) {
> > + if (kvm_ioapic_handles_vector(vcpu->kvm, vector))
> > + eoi_clr_pending(vcpu);
> > + else
> > + eoi_set_pending(vcpu);
> > + }
> > +
> > apic_set_vector(vector, apic->regs + APIC_ISR);
> > apic_update_ppr(apic);
> > apic_clear_irr(vector, apic);
> > @@ -1262,6 +1329,14 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu)
> > MSR_IA32_APICBASE_BASE;
> > kvm_apic_set_version(vcpu);
> >
> > + if (eoi_enabled(apic->vcpu)) {
> > + if (eoi_get_pending(apic->vcpu))
> > + __set_bit(KVM_APIC_EOI_PENDING,
> > + &vcpu->arch.apic_attention);
> > + else
> > + __clear_bit(KVM_APIC_EOI_PENDING,
> > + &vcpu->arch.apic_attention);
> > + }
> > apic_update_ppr(apic);
> > hrtimer_cancel(&apic->lapic_timer.timer);
> > update_divide_count(apic);
> > @@ -1283,11 +1358,25 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
> > hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
> > }
> >
> > +static void apic_update_eoi(struct kvm_lapic *apic)
> > +{
> > + int vector;
> > + BUG_ON(!eoi_enabled(apic->vcpu));
> > + if (eoi_get_pending(apic->vcpu))
> > + return;
> > + __clear_bit(KVM_APIC_EOI_PENDING, &apic->vcpu->arch.apic_attention);
> > + vector = apic_set_eoi(apic);
> > + trace_kvm_pv_eoi(apic, vector);
> > +}
> > +
> > void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
> > {
> > u32 data;
> > void *vapic;
> >
> > + if (test_bit(KVM_APIC_EOI_PENDING, &vcpu->arch.apic_attention))
> > + apic_update_eoi(vcpu->arch.apic);
> > +
> > if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
> > return;
> >
> > @@ -1394,3 +1483,13 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
> >
> > return 0;
> > }
> > +
> > +int kvm_pv_enable_apic_eoi(struct kvm_vcpu *vcpu, u64 data)
> > +{
> > + u64 addr = data & ~KVM_MSR_ENABLED;
> Since documentation is missing I do not know what alignment addr should
> have, but if it is more than 2 bytes we should check if reserved bits
> are not zero and #GP if they are.
Two bytes are required.
> > + if (eoi_enabled(vcpu))
> > + eoi_clr_pending(vcpu);
> > + vcpu->arch.eoi.msr_val = data;
> > + kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.eoi.data, addr);
> This may fail.
On invalid address, yes. Then the cache hva will be set to invalid
so accesses will fail too. So a malicious guest is only hurting
itself.
Maybe add a comment here?
> > + return 0;
> > +}
> > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> > index 6f4ce25..042dace 100644
> > --- a/arch/x86/kvm/lapic.h
> > +++ b/arch/x86/kvm/lapic.h
> > @@ -60,4 +60,6 @@ static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
> > {
> > return vcpu->arch.hv_vapic & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE;
> > }
> > +
> > +int kvm_pv_enable_apic_eoi(struct kvm_vcpu *vcpu, u64 data);
> > #endif
> > diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> > index 911d264..851914e 100644
> > --- a/arch/x86/kvm/trace.h
> > +++ b/arch/x86/kvm/trace.h
> > @@ -517,6 +517,40 @@ TRACE_EVENT(kvm_apic_accept_irq,
> > __entry->coalesced ? " (coalesced)" : "")
> > );
> >
> > +TRACE_EVENT(kvm_eoi,
> > + TP_PROTO(struct kvm_lapic *apic, int vector),
> > + TP_ARGS(apic, vector),
> > +
> > + TP_STRUCT__entry(
> > + __field( __u32, apicid )
> > + __field( int, vector )
> > + ),
> > +
> > + TP_fast_assign(
> > + __entry->apicid = apic->vcpu->vcpu_id;
> > + __entry->vector = vector;
> > + ),
> > +
> > + TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
> > +);
> > +
> > +TRACE_EVENT(kvm_pv_eoi,
> > + TP_PROTO(struct kvm_lapic *apic, int vector),
> > + TP_ARGS(apic, vector),
> > +
> > + TP_STRUCT__entry(
> > + __field( __u32, apicid )
> > + __field( int, vector )
> > + ),
> > +
> > + TP_fast_assign(
> > + __entry->apicid = apic->vcpu->vcpu_id;
> > + __entry->vector = vector;
> > + ),
> > +
> > + TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
> > +);
> > +
> > /*
> > * Tracepoint for nested VMRUN
> > */
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 185a2b8..bfcd97d 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -795,6 +795,7 @@ static u32 msrs_to_save[] = {
> > MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
> > HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
> > HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
> > + MSR_KVM_EOI_EN,
> > MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
> > MSR_STAR,
> > #ifdef CONFIG_X86_64
> > @@ -1653,6 +1654,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> > kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
> >
> > break;
> > + case MSR_KVM_EOI_EN:
> > + if (kvm_pv_enable_apic_eoi(vcpu, data))
> > + return 1;
> > + break;
> >
> > case MSR_IA32_MCG_CTL:
> > case MSR_IA32_MCG_STATUS:
> > @@ -5370,7 +5375,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> > if (unlikely(vcpu->arch.tsc_always_catchup))
> > kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> >
> > - kvm_lapic_sync_from_vapic(vcpu);
> > + if (unlikely(vcpu->arch.apic_attention))
> > + kvm_lapic_sync_from_vapic(vcpu);
> >
> > r = kvm_x86_ops->handle_exit(vcpu);
> > out:
> > --
> > MST
>
> --
> Gleb.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCHv2 7/7] kvm: host side for eoi optimization
2012-05-13 10:01 ` Michael S. Tsirkin
@ 2012-05-13 10:04 ` Gleb Natapov
0 siblings, 0 replies; 12+ messages in thread
From: Gleb Natapov @ 2012-05-13 10:04 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity,
Marcelo Tosatti, Linus Torvalds, linux-kernel
On Sun, May 13, 2012 at 01:01:28PM +0300, Michael S. Tsirkin wrote:
> > > + if (eoi_enabled(vcpu))
> > > + eoi_clr_pending(vcpu);
> > > + vcpu->arch.eoi.msr_val = data;
> > > + kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.eoi.data, addr);
> > This may fail.
>
> On invalid address, yes. Then the cache hva will be set to invalid
> so accesses will fail too. So a malicious guest is only hurting
> itself.
> Maybe add a comment here?
>
Kill it with #GP like other MSRs do.
--
Gleb.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCHv2 7/7] kvm: host side for eoi optimization
2012-05-13 9:33 ` Gleb Natapov
2012-05-13 10:01 ` Michael S. Tsirkin
@ 2012-05-13 15:15 ` Michael S. Tsirkin
1 sibling, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2012-05-13 15:15 UTC (permalink / raw)
To: Gleb Natapov
Cc: x86, kvm, Ingo Molnar, H. Peter Anvin, Avi Kivity,
Marcelo Tosatti, Linus Torvalds, linux-kernel
On Sun, May 13, 2012 at 12:33:17PM +0300, Gleb Natapov wrote:
> > arch/x86/include/asm/kvm_host.h | 6 ++
> > arch/x86/include/asm/kvm_para.h | 2 +
> > arch/x86/kvm/cpuid.c | 1 +
> > arch/x86/kvm/irq.c | 2 +-
> > arch/x86/kvm/lapic.c | 109 +++++++++++++++++++++++++++++++++++++--
> > arch/x86/kvm/lapic.h | 2 +
> > arch/x86/kvm/trace.h | 34 ++++++++++++
> > arch/x86/kvm/x86.c | 8 +++-
> Documentation ++++++ :)
I just sent out some documentation. Didn't want to
repost the whole patchset to acoid spamming the list,
I hope you and others can use it for review.
Thanks,
MST
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-05-13 15:15 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <cover.1336679924.git.mst@redhat.com>
2012-05-11 7:38 ` [PATCHv2 1/7] apic: fix typo EIO_ACK -> EOI_ACK and document Michael S. Tsirkin
2012-05-11 7:38 ` [PATCHv2 2/7] apic: use symbolic APIC_EOI_ACK Michael S. Tsirkin
2012-05-11 7:38 ` [PATCHv2 3/7] x86: add apic->eoi_write callback Michael S. Tsirkin
2012-05-11 7:38 ` [PATCHv2 4/7] x86: eoi micro-optimization Michael S. Tsirkin
2012-05-11 7:38 ` [PATCHv2 5/7] kvm_para: guest side for eoi avoidance Michael S. Tsirkin
2012-05-11 7:38 ` [PATCHv2 6/7] x86/bitops: note on __test_and_clear_bit atomicity Michael S. Tsirkin
2012-05-11 7:38 ` [PATCHv2 7/7] kvm: host side for eoi optimization Michael S. Tsirkin
2012-05-13 9:33 ` Gleb Natapov
2012-05-13 10:01 ` Michael S. Tsirkin
2012-05-13 10:04 ` Gleb Natapov
2012-05-13 15:15 ` Michael S. Tsirkin
2012-05-11 7:43 ` [PATCH] qemu: whitelist kvm pv eoi feature Michael S. Tsirkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).