* [RFC 0/3] Export APICv-related state via binary stats interface
@ 2024-02-15 16:01 Alejandro Jimenez
2024-02-15 16:01 ` [RFC 1/3] x86: KVM: stats: Add a stat to report status of APICv inhibition Alejandro Jimenez
` (4 more replies)
0 siblings, 5 replies; 25+ messages in thread
From: Alejandro Jimenez @ 2024-02-15 16:01 UTC (permalink / raw)
To: kvm
Cc: seanjc, pbonzini, linux-kernel, joao.m.martins, boris.ostrovsky,
mark.kanda, suravee.suthikulpanit, mlevitsk, alejandro.j.jimenez
The goal of this RFC is to agree on a mechanism for querying the state (and
related stats) of APICv/AVIC. I clearly have an AVIC bias when approaching this
topic since that is the side that I have mostly looked at, and has the greater
number of possible inhibits, but I believe the argument applies for both
vendor's technologies.
Currently, a user or monitoring app trying to determine if APICv is actually
being used needs implementation-specific knowlegde in order to look for specific
types of #VMEXIT (i.e. AVIC_INCOMPLETE_IPI/AVIC_NOACCEL), checking GALog events
by watching /proc/interrupts for AMD-Vi*-GA, etc. There are existing tracepoints
(e.g. kvm_apicv_accept_irq, kvm_avic_ga_log) that make this task easier, but
tracefs is not viable in some scenarios. Adding kvm debugfs entries has similar
downsides. Suravee has previously proposed a new IOCTL interface[0] to expose
this information, but there has not been any development in that direction.
Sean has mentioned a preference for using BPF to extract info from the current
tracepoints, which would require reworking existing structs to access some
desired data, but as far as I know there isn't any work done on that approach
yet.
Recently Joao mentioned another alternative: the binary stats framework that is
already supported by kernel[1] and QEMU[2]. This RFC has minimal code changes to
expose the relevant info based on the existing data types the framework already
supports. If there is consensus on using this approach, I can expand the fd
stats subsystem to include other data types (e.g. a bitmap type for exposing the
inhibit reasons), as well as adding documentation on KVM explaining which stats
are relevant for APICv and how to query them.
A basic example of retrieving the stats via qmp-shell, showing both a VM and
per-vCPU case:
# /usr/local/bin/qmp-shell --pretty ./qmp-sock
(QEMU) query-stats target=vm providers=[{'provider':'kvm','names':['apicv_inhibited']}]
{
"return": [
{
"provider": "kvm",
"stats": [
{
"name": "apicv_inhibited",
"value": false
}
]
}
]
}
(QEMU) query-stats target=vcpu vcpus=['/machine/unattached/device[0]'] providers=[{'provider':'kvm','names':['apicv_accept_irq','ga_log_event']}]
{
"return": [
{
"provider": "kvm",
"qom-path": "/machine/unattached/device[0]",
"stats": [
{
"name": "ga_log_event",
"value": 98
},
{
"name": "apicv_accept_irq",
"value": 166920
}
]
}
]
}
If other alternatives are preferred, please let's use this thread to discuss and
I can take a shot at implementing the desired solution.
Regards,
Alejandro
[0] https://lore.kernel.org/qemu-devel/7e0d22fa-b9b0-ad1a-3a37-a450ec5d73e8@amd.com/
[1] https://lore.kernel.org/all/20210618222709.1858088-1-jingzhangos@google.com/
[2] https://lore.kernel.org/qemu-devel/20220530150714.756954-1-pbonzini@redhat.com/
Alejandro Jimenez (3):
x86: KVM: stats: Add a stat to report status of APICv inhibition
x86: KVM: stats: Add stat counter for IRQs injected via APICv
x86: KVM: stats: Add a stat counter for GALog events
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/svm/avic.c | 4 +++-
arch/x86/kvm/svm/svm.c | 3 +++
arch/x86/kvm/vmx/vmx.c | 2 ++
arch/x86/kvm/x86.c | 12 +++++++++++-
5 files changed, 22 insertions(+), 2 deletions(-)
base-commit: 7455665a3521aa7b56245c0a2810f748adc5fdd4
--
2.39.3
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFC 1/3] x86: KVM: stats: Add a stat to report status of APICv inhibition
2024-02-15 16:01 [RFC 0/3] Export APICv-related state via binary stats interface Alejandro Jimenez
@ 2024-02-15 16:01 ` Alejandro Jimenez
2024-04-09 14:54 ` Suthikulpanit, Suravee
2024-04-16 18:19 ` Sean Christopherson
2024-02-15 16:01 ` [RFC 2/3] x86: KVM: stats: Add stat counter for IRQs injected via APICv Alejandro Jimenez
` (3 subsequent siblings)
4 siblings, 2 replies; 25+ messages in thread
From: Alejandro Jimenez @ 2024-02-15 16:01 UTC (permalink / raw)
To: kvm
Cc: seanjc, pbonzini, linux-kernel, joao.m.martins, boris.ostrovsky,
mark.kanda, suravee.suthikulpanit, mlevitsk, alejandro.j.jimenez
The inhibition status of APICv can currently be checked using the
'kvm_apicv_inhibit_changed' tracepoint, but this is not accessible if
tracefs is not available (e.g. kernel lockdown, non-root user). Export
inhibition status as a binary stat that can be monitored from userspace
without elevated privileges.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/x86.c | 10 +++++++++-
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ad5319a503f0..9b960a523715 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1524,6 +1524,7 @@ struct kvm_vm_stat {
u64 nx_lpage_splits;
u64 max_mmu_page_hash_collisions;
u64 max_mmu_rmap_size;
+ u64 apicv_inhibited;
};
struct kvm_vcpu_stat {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b66c45e7f6f8..f7f598f066e7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -255,7 +255,8 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
STATS_DESC_ICOUNTER(VM, pages_1g),
STATS_DESC_ICOUNTER(VM, nx_lpage_splits),
STATS_DESC_PCOUNTER(VM, max_mmu_rmap_size),
- STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions)
+ STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions),
+ STATS_DESC_IBOOLEAN(VM, apicv_inhibited)
};
const struct kvm_stats_header kvm_vm_stats_header = {
@@ -10588,6 +10589,13 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
*/
kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
kvm->arch.apicv_inhibit_reasons = new;
+
+ /*
+ * Update inhibition statistic only when toggling APICv
+ * activation status.
+ */
+ kvm->stat.apicv_inhibited = !!new;
+
if (new) {
unsigned long gfn = gpa_to_gfn(APIC_DEFAULT_PHYS_BASE);
int idx = srcu_read_lock(&kvm->srcu);
--
2.39.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFC 2/3] x86: KVM: stats: Add stat counter for IRQs injected via APICv
2024-02-15 16:01 [RFC 0/3] Export APICv-related state via binary stats interface Alejandro Jimenez
2024-02-15 16:01 ` [RFC 1/3] x86: KVM: stats: Add a stat to report status of APICv inhibition Alejandro Jimenez
@ 2024-02-15 16:01 ` Alejandro Jimenez
2024-02-15 16:16 ` Dongli Zhang
2024-02-15 16:01 ` [RFC 3/3] x86: KVM: stats: Add a stat counter for GALog events Alejandro Jimenez
` (2 subsequent siblings)
4 siblings, 1 reply; 25+ messages in thread
From: Alejandro Jimenez @ 2024-02-15 16:01 UTC (permalink / raw)
To: kvm
Cc: seanjc, pbonzini, linux-kernel, joao.m.martins, boris.ostrovsky,
mark.kanda, suravee.suthikulpanit, mlevitsk, alejandro.j.jimenez
Export binary stat counting how many interrupts have been delivered via
APICv/AVIC acceleration from the host. This is one of the most reliable
methods to detect when hardware accelerated interrupt delivery is active,
since APIC timer interrupts are regularly injected and exercise these
code paths.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm/svm.c | 3 +++
arch/x86/kvm/vmx/vmx.c | 2 ++
arch/x86/kvm/x86.c | 1 +
4 files changed, 7 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9b960a523715..b6f18084d504 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1564,6 +1564,7 @@ struct kvm_vcpu_stat {
u64 preemption_other;
u64 guest_mode;
u64 notify_window_exits;
+ u64 apicv_accept_irq;
};
struct x86_instruction_info;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e90b429c84f1..2243af08ed39 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3648,6 +3648,9 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
}
trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode, trig_mode, vector);
+
+ ++vcpu->stat.apicv_accept_irq;
+
if (in_guest_mode) {
/*
* Signal the doorbell to tell hardware to inject the IRQ. If
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d4e6625e0a9a..f7db75ae2c55 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4275,6 +4275,8 @@ static void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
} else {
trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode,
trig_mode, vector);
+
+ ++vcpu->stat.apicv_accept_irq;
}
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f7f598f066e7..2ad70cf6e52c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -304,6 +304,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
STATS_DESC_COUNTER(VCPU, preemption_other),
STATS_DESC_IBOOLEAN(VCPU, guest_mode),
STATS_DESC_COUNTER(VCPU, notify_window_exits),
+ STATS_DESC_COUNTER(VCPU, apicv_accept_irq),
};
const struct kvm_stats_header kvm_vcpu_stats_header = {
--
2.39.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFC 3/3] x86: KVM: stats: Add a stat counter for GALog events
2024-02-15 16:01 [RFC 0/3] Export APICv-related state via binary stats interface Alejandro Jimenez
2024-02-15 16:01 ` [RFC 1/3] x86: KVM: stats: Add a stat to report status of APICv inhibition Alejandro Jimenez
2024-02-15 16:01 ` [RFC 2/3] x86: KVM: stats: Add stat counter for IRQs injected via APICv Alejandro Jimenez
@ 2024-02-15 16:01 ` Alejandro Jimenez
2024-04-09 6:45 ` Chao Gao
2024-04-09 5:09 ` [RFC 0/3] Export APICv-related state via binary stats interface Vasant Hegde
2024-04-16 18:08 ` Sean Christopherson
4 siblings, 1 reply; 25+ messages in thread
From: Alejandro Jimenez @ 2024-02-15 16:01 UTC (permalink / raw)
To: kvm
Cc: seanjc, pbonzini, linux-kernel, joao.m.martins, boris.ostrovsky,
mark.kanda, suravee.suthikulpanit, mlevitsk, alejandro.j.jimenez
Export a per-vCPU binary stat counting GALog events received. Such events
are specific to IOMMU AVIC, and their presence can be used to confirm that
this capability is active. Also, exporting this statistic is useful to
assert that device interrupts are being sent to specific vCPUs, confirming
IRQ affinity settings.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm/avic.c | 4 +++-
arch/x86/kvm/x86.c | 1 +
3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b6f18084d504..74e08b57f2e0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1565,6 +1565,7 @@ struct kvm_vcpu_stat {
u64 guest_mode;
u64 notify_window_exits;
u64 apicv_accept_irq;
+ u64 ga_log_event;
};
struct x86_instruction_info;
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 4b74ea91f4e6..853cafe4a9af 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -165,8 +165,10 @@ int avic_ga_log_notifier(u32 ga_tag)
* bit in the vAPIC backing page. So, we just need to schedule
* in the vcpu.
*/
- if (vcpu)
+ if (vcpu) {
kvm_vcpu_wake_up(vcpu);
+ ++vcpu->stat.ga_log_event;
+ }
return 0;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2ad70cf6e52c..6a1df29ae650 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -305,6 +305,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
STATS_DESC_IBOOLEAN(VCPU, guest_mode),
STATS_DESC_COUNTER(VCPU, notify_window_exits),
STATS_DESC_COUNTER(VCPU, apicv_accept_irq),
+ STATS_DESC_COUNTER(VCPU, ga_log_event),
};
const struct kvm_stats_header kvm_vcpu_stats_header = {
--
2.39.3
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [RFC 2/3] x86: KVM: stats: Add stat counter for IRQs injected via APICv
2024-02-15 16:01 ` [RFC 2/3] x86: KVM: stats: Add stat counter for IRQs injected via APICv Alejandro Jimenez
@ 2024-02-15 16:16 ` Dongli Zhang
2024-02-15 18:12 ` Alejandro Jimenez
0 siblings, 1 reply; 25+ messages in thread
From: Dongli Zhang @ 2024-02-15 16:16 UTC (permalink / raw)
To: Alejandro Jimenez, kvm
Cc: seanjc, pbonzini, linux-kernel, joao.m.martins, boris.ostrovsky,
mark.kanda, suravee.suthikulpanit, mlevitsk
Hi Alejandro,
Is there any use case of this counter in the bug?
E.g., there are already trace_kvm_apicv_accept_irq() there. The ftrace or ebpf
would be able to tell if the hardware accelerated interrupt delivery is active?.
Any extra benefits? E.g., if this counter may need to match with any other
counter in the KVM/guest so that a bug can be detected? That will be very helpful.
Thank you very much!
Dongli Zhang
On 2/15/24 08:01, Alejandro Jimenez wrote:
> Export binary stat counting how many interrupts have been delivered via
> APICv/AVIC acceleration from the host. This is one of the most reliable
> methods to detect when hardware accelerated interrupt delivery is active,
> since APIC timer interrupts are regularly injected and exercise these
> code paths.
>
> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/svm/svm.c | 3 +++
> arch/x86/kvm/vmx/vmx.c | 2 ++
> arch/x86/kvm/x86.c | 1 +
> 4 files changed, 7 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9b960a523715..b6f18084d504 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1564,6 +1564,7 @@ struct kvm_vcpu_stat {
> u64 preemption_other;
> u64 guest_mode;
> u64 notify_window_exits;
> + u64 apicv_accept_irq;
> };
>
> struct x86_instruction_info;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index e90b429c84f1..2243af08ed39 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3648,6 +3648,9 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
> }
>
> trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode, trig_mode, vector);
> +
> + ++vcpu->stat.apicv_accept_irq;
> +
> if (in_guest_mode) {
> /*
> * Signal the doorbell to tell hardware to inject the IRQ. If
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index d4e6625e0a9a..f7db75ae2c55 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4275,6 +4275,8 @@ static void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
> } else {
> trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode,
> trig_mode, vector);
> +
> + ++vcpu->stat.apicv_accept_irq;
> }
> }
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f7f598f066e7..2ad70cf6e52c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -304,6 +304,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
> STATS_DESC_COUNTER(VCPU, preemption_other),
> STATS_DESC_IBOOLEAN(VCPU, guest_mode),
> STATS_DESC_COUNTER(VCPU, notify_window_exits),
> + STATS_DESC_COUNTER(VCPU, apicv_accept_irq),
> };
>
> const struct kvm_stats_header kvm_vcpu_stats_header = {
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 2/3] x86: KVM: stats: Add stat counter for IRQs injected via APICv
2024-02-15 16:16 ` Dongli Zhang
@ 2024-02-15 18:12 ` Alejandro Jimenez
2024-04-16 18:25 ` Sean Christopherson
0 siblings, 1 reply; 25+ messages in thread
From: Alejandro Jimenez @ 2024-02-15 18:12 UTC (permalink / raw)
To: Dongli Zhang, kvm
Cc: seanjc, pbonzini, linux-kernel, joao.m.martins, boris.ostrovsky,
mark.kanda, suravee.suthikulpanit, mlevitsk
Hi Dongli
On 2/15/24 11:16, Dongli Zhang wrote:
> Hi Alejandro,
>
> Is there any use case of this counter in the bug?
I don't have a specific bug in mind that this is trying to address. This patch is just an example is to show how existing data points (i.e. the trace_kvm_apicv_accept_irq tracepoint) can also be exposed via the stats framework with minimal overhead, and to support the point in the cover letter that querying the binary stats could be the best choice for a "single source" that tells us the full status of APICv/AVIC (i.e. is SVM and IOMMU AVIC both working, are there any inhibits set, etc)
>
> E.g., there are already trace_kvm_apicv_accept_irq() there. The ftrace or ebpf
> would be able to tell if the hardware accelerated interrupt delivery is active?.
Yes, the tracepoint already provides information if you know it exists AND have sufficient privileges to use tracefs or ebpf. The purpose of the RFC is to agree on a mechanism by which to expose all the apicv relevant data (and any new additions) via a single interface so that the sources of information are not scattered across tracepoints, debugfs entries, or in data structures that need to be read via BPF.
My understanding is that the stats subsystem method can work when using ftrace of bpftrace is not possible, so that is why I am suggesting that is used as the "standard" method to expose this info.
There will of course be some duplication with existing tracepoints, but there is already precedent in KVM where both stats and tracepoints are updated simultaneously (e.g. mmu_{un}sync_page(), {svm|vmx}_inject_irq()).
>
> Any extra benefits? E.g., if this counter may need to match with any other
> counter in the KVM/guest so that a bug can be detected? That will be very helpful.
Again, I didn't have a specific scenario for using this counter other than the associated tracepoint is the one I typically use to determine if APICv is active. But let's think of an example on the spot: In a hypothetical scenario where I want to determine the ratio that a vCPU spends blocking or in guest mode, I could add another stat e.g.:
+
+ ++vcpu->stat.apicv_accept_irq;
+
if (in_guest_mode) {
/*
* Signal the doorbell to tell hardware to inject the IRQ. If
* the vCPU exits the guest before the doorbell chimes, hardware
* will automatically process AVIC interrupts at the next VMRUN.
*/
avic_ring_doorbell(vcpu);
+ ++vcpu->stat.avic_doorbell_rung;
} else {
/*
* Wake the vCPU if it was blocking. KVM will then detect the
* pending IRQ when checking if the vCPU has a wake event.
*/
kvm_vcpu_wake_up(vcpu);
}
and then the ratio of (avic_doorbell_rung / apicv_accept_irq) lets me estimate a percentage of time the target vCPU is idle or running. There are likely better ways of determining this, but you get the idea. The goal is to have a general consensus for whether or not I should opt to add a new tracepoint (trace_kvm_avic_ring_doorbell) or a new stat as the "preferred" solution. Obviously there are still cases where a tracepoint is the best approach (e.g. it transfers more information).
Hopefully I didn't stray too far from your question/point.
Alejandro
>
> Thank you very much!
>
> Dongli Zhang
>
> On 2/15/24 08:01, Alejandro Jimenez wrote:
>> Export binary stat counting how many interrupts have been delivered via
>> APICv/AVIC acceleration from the host. This is one of the most reliable
>> methods to detect when hardware accelerated interrupt delivery is active,
>> since APIC timer interrupts are regularly injected and exercise these
>> code paths.
>>
>> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
>> ---
>> arch/x86/include/asm/kvm_host.h | 1 +
>> arch/x86/kvm/svm/svm.c | 3 +++
>> arch/x86/kvm/vmx/vmx.c | 2 ++
>> arch/x86/kvm/x86.c | 1 +
>> 4 files changed, 7 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 9b960a523715..b6f18084d504 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1564,6 +1564,7 @@ struct kvm_vcpu_stat {
>> u64 preemption_other;
>> u64 guest_mode;
>> u64 notify_window_exits;
>> + u64 apicv_accept_irq;
>> };
>>
>> struct x86_instruction_info;
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index e90b429c84f1..2243af08ed39 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3648,6 +3648,9 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
>> }
>>
>> trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode, trig_mode, vector);
>> +
>> + ++vcpu->stat.apicv_accept_irq;
>> +
>> if (in_guest_mode) {
>> /*
>> * Signal the doorbell to tell hardware to inject the IRQ. If
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index d4e6625e0a9a..f7db75ae2c55 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -4275,6 +4275,8 @@ static void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
>> } else {
>> trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode,
>> trig_mode, vector);
>> +
>> + ++vcpu->stat.apicv_accept_irq;
>> }
>> }
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index f7f598f066e7..2ad70cf6e52c 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -304,6 +304,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
>> STATS_DESC_COUNTER(VCPU, preemption_other),
>> STATS_DESC_IBOOLEAN(VCPU, guest_mode),
>> STATS_DESC_COUNTER(VCPU, notify_window_exits),
>> + STATS_DESC_COUNTER(VCPU, apicv_accept_irq),
>> };
>>
>> const struct kvm_stats_header kvm_vcpu_stats_header = {
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 0/3] Export APICv-related state via binary stats interface
2024-02-15 16:01 [RFC 0/3] Export APICv-related state via binary stats interface Alejandro Jimenez
` (2 preceding siblings ...)
2024-02-15 16:01 ` [RFC 3/3] x86: KVM: stats: Add a stat counter for GALog events Alejandro Jimenez
@ 2024-04-09 5:09 ` Vasant Hegde
2024-04-10 1:36 ` Alejandro Jimenez
2024-04-16 18:08 ` Sean Christopherson
4 siblings, 1 reply; 25+ messages in thread
From: Vasant Hegde @ 2024-04-09 5:09 UTC (permalink / raw)
To: Alejandro Jimenez, kvm
Cc: seanjc, pbonzini, linux-kernel, joao.m.martins, boris.ostrovsky,
mark.kanda, suravee.suthikulpanit, mlevitsk
Hi Alejadnro,
On 2/15/2024 9:31 PM, Alejandro Jimenez wrote:
> The goal of this RFC is to agree on a mechanism for querying the state (and
> related stats) of APICv/AVIC. I clearly have an AVIC bias when approaching this
> topic since that is the side that I have mostly looked at, and has the greater
> number of possible inhibits, but I believe the argument applies for both
> vendor's technologies.
>
> Currently, a user or monitoring app trying to determine if APICv is actually
> being used needs implementation-specific knowlegde in order to look for specific
> types of #VMEXIT (i.e. AVIC_INCOMPLETE_IPI/AVIC_NOACCEL), checking GALog events
> by watching /proc/interrupts for AMD-Vi*-GA, etc. There are existing tracepoints
> (e.g. kvm_apicv_accept_irq, kvm_avic_ga_log) that make this task easier, but
> tracefs is not viable in some scenarios. Adding kvm debugfs entries has similar
> downsides. Suravee has previously proposed a new IOCTL interface[0] to expose
> this information, but there has not been any development in that direction.
> Sean has mentioned a preference for using BPF to extract info from the current
> tracepoints, which would require reworking existing structs to access some
> desired data, but as far as I know there isn't any work done on that approach
> yet.
>
> Recently Joao mentioned another alternative: the binary stats framework that is
> already supported by kernel[1] and QEMU[2]. This RFC has minimal code changes to
> expose the relevant info based on the existing data types the framework already
> supports. If there is consensus on using this approach, I can expand the fd
> stats subsystem to include other data types (e.g. a bitmap type for exposing the
> inhibit reasons), as well as adding documentation on KVM explaining which stats
> are relevant for APICv and how to query them.
Thanks for the series. IMO this approach makes sense. May be we should
consider adding one more stat to say whether AVIC is active or not. That
way,
Check whether AVIC is active or not.
If AVIC is active, then inhibited or not
If not inhibited, then use other statistics info.
I have reviewed/tested this series on AMD Genoa platform. It looks good
to me.
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
-Vasant
>
> A basic example of retrieving the stats via qmp-shell, showing both a VM and
> per-vCPU case:
>
> # /usr/local/bin/qmp-shell --pretty ./qmp-sock
>
> (QEMU) query-stats target=vm providers=[{'provider':'kvm','names':['apicv_inhibited']}]
> {
> "return": [
> {
> "provider": "kvm",
> "stats": [
> {
> "name": "apicv_inhibited",
> "value": false
> }
> ]
> }
> ]
> }
>
> (QEMU) query-stats target=vcpu vcpus=['/machine/unattached/device[0]'] providers=[{'provider':'kvm','names':['apicv_accept_irq','ga_log_event']}]
> {
> "return": [
> {
> "provider": "kvm",
> "qom-path": "/machine/unattached/device[0]",
> "stats": [
> {
> "name": "ga_log_event",
> "value": 98
> },
> {
> "name": "apicv_accept_irq",
> "value": 166920
> }
> ]
> }
> ]
> }
>
> If other alternatives are preferred, please let's use this thread to discuss and
> I can take a shot at implementing the desired solution.
>
> Regards,
> Alejandro
>
> [0] https://lore.kernel.org/qemu-devel/7e0d22fa-b9b0-ad1a-3a37-a450ec5d73e8@amd.com/
> [1] https://lore.kernel.org/all/20210618222709.1858088-1-jingzhangos@google.com/
> [2] https://lore.kernel.org/qemu-devel/20220530150714.756954-1-pbonzini@redhat.com/
>
> Alejandro Jimenez (3):
> x86: KVM: stats: Add a stat to report status of APICv inhibition
> x86: KVM: stats: Add stat counter for IRQs injected via APICv
> x86: KVM: stats: Add a stat counter for GALog events
>
> arch/x86/include/asm/kvm_host.h | 3 +++
> arch/x86/kvm/svm/avic.c | 4 +++-
> arch/x86/kvm/svm/svm.c | 3 +++
> arch/x86/kvm/vmx/vmx.c | 2 ++
> arch/x86/kvm/x86.c | 12 +++++++++++-
> 5 files changed, 22 insertions(+), 2 deletions(-)
>
>
> base-commit: 7455665a3521aa7b56245c0a2810f748adc5fdd4
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 3/3] x86: KVM: stats: Add a stat counter for GALog events
2024-02-15 16:01 ` [RFC 3/3] x86: KVM: stats: Add a stat counter for GALog events Alejandro Jimenez
@ 2024-04-09 6:45 ` Chao Gao
2024-04-10 1:31 ` Alejandro Jimenez
0 siblings, 1 reply; 25+ messages in thread
From: Chao Gao @ 2024-04-09 6:45 UTC (permalink / raw)
To: Alejandro Jimenez
Cc: kvm, seanjc, pbonzini, linux-kernel, joao.m.martins,
boris.ostrovsky, mark.kanda, suravee.suthikulpanit, mlevitsk
>diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
>index 4b74ea91f4e6..853cafe4a9af 100644
>--- a/arch/x86/kvm/svm/avic.c
>+++ b/arch/x86/kvm/svm/avic.c
>@@ -165,8 +165,10 @@ int avic_ga_log_notifier(u32 ga_tag)
> * bit in the vAPIC backing page. So, we just need to schedule
> * in the vcpu.
> */
>- if (vcpu)
>+ if (vcpu) {
> kvm_vcpu_wake_up(vcpu);
>+ ++vcpu->stat.ga_log_event;
>+ }
>
I am not sure why this is added for SVM only. it looks to me GALog events are
similar to Intel IOMMU's wakeup events. Can we have a general name? maybe
iommu_wakeup_event
and increase the counter after the kvm_vcpu_wake_up() call in pi_wakeup_handler().
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 1/3] x86: KVM: stats: Add a stat to report status of APICv inhibition
2024-02-15 16:01 ` [RFC 1/3] x86: KVM: stats: Add a stat to report status of APICv inhibition Alejandro Jimenez
@ 2024-04-09 14:54 ` Suthikulpanit, Suravee
2024-04-16 18:19 ` Sean Christopherson
1 sibling, 0 replies; 25+ messages in thread
From: Suthikulpanit, Suravee @ 2024-04-09 14:54 UTC (permalink / raw)
To: Alejandro Jimenez, kvm
Cc: seanjc, pbonzini, linux-kernel, joao.m.martins, boris.ostrovsky,
mark.kanda, mlevitsk
Hi
On 2/15/2024 11:01 PM, Alejandro Jimenez wrote:
> The inhibition status of APICv can currently be checked using the
> 'kvm_apicv_inhibit_changed' tracepoint, but this is not accessible if
> tracefs is not available (e.g. kernel lockdown, non-root user). Export
> inhibition status as a binary stat that can be monitored from userspace
> without elevated privileges.
>
> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/x86.c | 10 +++++++++-
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ad5319a503f0..9b960a523715 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1524,6 +1524,7 @@ struct kvm_vm_stat {
> u64 nx_lpage_splits;
> u64 max_mmu_page_hash_collisions;
> u64 max_mmu_rmap_size;
> + u64 apicv_inhibited;
> };
>
> struct kvm_vcpu_stat {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b66c45e7f6f8..f7f598f066e7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -255,7 +255,8 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
> STATS_DESC_ICOUNTER(VM, pages_1g),
> STATS_DESC_ICOUNTER(VM, nx_lpage_splits),
> STATS_DESC_PCOUNTER(VM, max_mmu_rmap_size),
> - STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions)
> + STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions),
> + STATS_DESC_IBOOLEAN(VM, apicv_inhibited)
One use for this stats would be to help:
1. Determine if vm and/or vcpu is inhibited.
2. Determine the inhibit reason.
Therefore can we use STATS_DESC_ICOUNTER() instead of
STATS_DESC_IBOOLEAN(), and show the inhibit flag from
vm_reason = struct kvm->arch.apicv_inhibit_reasons
vcpu_reason = static_call(kvm_x86_vcpu_get_apicv_inhibit_reasons)(vcpu)
See kvm_vcpu_apicv_activated().
Thanks,
Suravee
> };
>
> const struct kvm_stats_header kvm_vm_stats_header = {
> @@ -10588,6 +10589,13 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
> */
> kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
> kvm->arch.apicv_inhibit_reasons = new;
> +
> + /*
> + * Update inhibition statistic only when toggling APICv
> + * activation status.
> + */
> + kvm->stat.apicv_inhibited = !!new;
> +
> if (new) {
> unsigned long gfn = gpa_to_gfn(APIC_DEFAULT_PHYS_BASE);
> int idx = srcu_read_lock(&kvm->srcu);
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 3/3] x86: KVM: stats: Add a stat counter for GALog events
2024-04-09 6:45 ` Chao Gao
@ 2024-04-10 1:31 ` Alejandro Jimenez
2024-04-12 10:45 ` Chao Gao
0 siblings, 1 reply; 25+ messages in thread
From: Alejandro Jimenez @ 2024-04-10 1:31 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, seanjc, pbonzini, linux-kernel, joao.m.martins,
boris.ostrovsky, mark.kanda, suravee.suthikulpanit, mlevitsk
On 4/9/24 02:45, Chao Gao wrote:
>> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
>> index 4b74ea91f4e6..853cafe4a9af 100644
>> --- a/arch/x86/kvm/svm/avic.c
>> +++ b/arch/x86/kvm/svm/avic.c
>> @@ -165,8 +165,10 @@ int avic_ga_log_notifier(u32 ga_tag)
>> * bit in the vAPIC backing page. So, we just need to schedule
>> * in the vcpu.
>> */
>> - if (vcpu)
>> + if (vcpu) {
>> kvm_vcpu_wake_up(vcpu);
>> + ++vcpu->stat.ga_log_event;
>> + }
>>
>
> I am not sure why this is added for SVM only.
I am mostly familiar with AVIC, and much less so with VMX's PI, so this is
why I am likely missing potential stats that could be useful to expose from
the VMX side. I'll be glad to implement any other suggestions you have.
it looks to me GALog events are
> similar to Intel IOMMU's wakeup events. Can we have a general name? maybe
> iommu_wakeup_event
I believe that after:
d588bb9be1da ("KVM: VMX: enable IPI virtualization")
both the VT-d PI and the virtualized IPIs code paths will use POSTED_INTR_WAKEUP_VECTOR
for interrupts targeting a blocked vCPU. So on Intel hosts enabling IPI virtualization,
a counter incremented in pi_wakeup_handler() would record interrupts from both virtualized
IPIs and VT-d sources.
I don't think it is correct to generalize this counter since AMD's implementation is
different; when a blocked vCPU is targeted:
- by device interrupts, it uses the GA Log mechanism
- by an IPI, it generates an AVIC_INCOMPLETE_IPI #VMEXIT
If the reasoning above is correct, we can add a VMX specific counter (vmx_pi_wakeup_event?)
that is increased in pi_wakeup_handler() as you suggest, and document the difference
in behavior so that is not confused as equivalent with the ga_log_event counter.
An alternative if we'd like to have a common 'iommu_wakeup_event' is to add filtering on
pi_wakeup_handler() and only increment the common counter if IPI virtualization is not
enabled (i.e. !vmx_can_use_ipiv()), in which case we'd only handle device interrupts
and it becomes the parallel case to GA Log events.
That leaves us with a VMX-specific counter (vmx_pi_wakeup_event) which provides no
definition between interrupt sources when IPI virtualization is enabled, or when disabled
we have a common/generic counter (iommu_wakeup_event) that applies to both vendors.
Please let me know if you agree with this approach or have other suggestions.
Thank you,
Alejandro
>
> and increase the counter after the kvm_vcpu_wake_up() call in pi_wakeup_handler().
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 0/3] Export APICv-related state via binary stats interface
2024-04-09 5:09 ` [RFC 0/3] Export APICv-related state via binary stats interface Vasant Hegde
@ 2024-04-10 1:36 ` Alejandro Jimenez
0 siblings, 0 replies; 25+ messages in thread
From: Alejandro Jimenez @ 2024-04-10 1:36 UTC (permalink / raw)
To: Vasant Hegde, kvm
Cc: seanjc, pbonzini, linux-kernel, joao.m.martins, boris.ostrovsky,
mark.kanda, suravee.suthikulpanit, mlevitsk
On 4/9/24 01:09, Vasant Hegde wrote:
> Hi Alejadnro,
>
> On 2/15/2024 9:31 PM, Alejandro Jimenez wrote:
>> The goal of this RFC is to agree on a mechanism for querying the state (and
>> related stats) of APICv/AVIC. I clearly have an AVIC bias when approaching this
>> topic since that is the side that I have mostly looked at, and has the greater
>> number of possible inhibits, but I believe the argument applies for both
>> vendor's technologies.
>>
>> Currently, a user or monitoring app trying to determine if APICv is actually
>> being used needs implementation-specific knowlegde in order to look for specific
>> types of #VMEXIT (i.e. AVIC_INCOMPLETE_IPI/AVIC_NOACCEL), checking GALog events
>> by watching /proc/interrupts for AMD-Vi*-GA, etc. There are existing tracepoints
>> (e.g. kvm_apicv_accept_irq, kvm_avic_ga_log) that make this task easier, but
>> tracefs is not viable in some scenarios. Adding kvm debugfs entries has similar
>> downsides. Suravee has previously proposed a new IOCTL interface[0] to expose
>> this information, but there has not been any development in that direction.
>> Sean has mentioned a preference for using BPF to extract info from the current
>> tracepoints, which would require reworking existing structs to access some
>> desired data, but as far as I know there isn't any work done on that approach
>> yet.
>>
>> Recently Joao mentioned another alternative: the binary stats framework that is
>> already supported by kernel[1] and QEMU[2]. This RFC has minimal code changes to
>> expose the relevant info based on the existing data types the framework already
>> supports. If there is consensus on using this approach, I can expand the fd
>> stats subsystem to include other data types (e.g. a bitmap type for exposing the
>> inhibit reasons), as well as adding documentation on KVM explaining which stats
>> are relevant for APICv and how to query them.
>
> Thanks for the series. IMO this approach makes sense. May be we should consider adding one more stat to say whether AVIC is active or not. That way,
> Check whether AVIC is active or not.
> If AVIC is active, then inhibited or not
> If not inhibited, then use other statistics info.
Hi Vasant,
Thank you for reviewing/testing. I'll implement your suggestion and send it on the next revision.
Alejandro
>
>
> I have reviewed/tested this series on AMD Genoa platform. It looks good to me.
>
> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
>
> -Vasant
>
>>
>> A basic example of retrieving the stats via qmp-shell, showing both a VM and
>> per-vCPU case:
>>
>> # /usr/local/bin/qmp-shell --pretty ./qmp-sock
>>
>> (QEMU) query-stats target=vm providers=[{'provider':'kvm','names':['apicv_inhibited']}]
>> {
>> "return": [
>> {
>> "provider": "kvm",
>> "stats": [
>> {
>> "name": "apicv_inhibited",
>> "value": false
>> }
>> ]
>> }
>> ]
>> }
>>
>> (QEMU) query-stats target=vcpu vcpus=['/machine/unattached/device[0]'] providers=[{'provider':'kvm','names':['apicv_accept_irq','ga_log_event']}]
>> {
>> "return": [
>> {
>> "provider": "kvm",
>> "qom-path": "/machine/unattached/device[0]",
>> "stats": [
>> {
>> "name": "ga_log_event",
>> "value": 98
>> },
>> {
>> "name": "apicv_accept_irq",
>> "value": 166920
>> }
>> ]
>> }
>> ]
>> }
>>
>> If other alternatives are preferred, please let's use this thread to discuss and
>> I can take a shot at implementing the desired solution.
>>
>> Regards,
>> Alejandro
>>
>> [0] https://lore.kernel.org/qemu-devel/7e0d22fa-b9b0-ad1a-3a37-a450ec5d73e8@amd.com/
>> [1] https://lore.kernel.org/all/20210618222709.1858088-1-jingzhangos@google.com/
>> [2] https://lore.kernel.org/qemu-devel/20220530150714.756954-1-pbonzini@redhat.com/
>>
>> Alejandro Jimenez (3):
>> x86: KVM: stats: Add a stat to report status of APICv inhibition
>> x86: KVM: stats: Add stat counter for IRQs injected via APICv
>> x86: KVM: stats: Add a stat counter for GALog events
>>
>> arch/x86/include/asm/kvm_host.h | 3 +++
>> arch/x86/kvm/svm/avic.c | 4 +++-
>> arch/x86/kvm/svm/svm.c | 3 +++
>> arch/x86/kvm/vmx/vmx.c | 2 ++
>> arch/x86/kvm/x86.c | 12 +++++++++++-
>> 5 files changed, 22 insertions(+), 2 deletions(-)
>>
>>
>> base-commit: 7455665a3521aa7b56245c0a2810f748adc5fdd4
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 3/3] x86: KVM: stats: Add a stat counter for GALog events
2024-04-10 1:31 ` Alejandro Jimenez
@ 2024-04-12 10:45 ` Chao Gao
2024-04-16 18:35 ` Sean Christopherson
0 siblings, 1 reply; 25+ messages in thread
From: Chao Gao @ 2024-04-12 10:45 UTC (permalink / raw)
To: Alejandro Jimenez
Cc: kvm, seanjc, pbonzini, linux-kernel, joao.m.martins,
boris.ostrovsky, mark.kanda, suravee.suthikulpanit, mlevitsk
On Tue, Apr 09, 2024 at 09:31:45PM -0400, Alejandro Jimenez wrote:
>
>On 4/9/24 02:45, Chao Gao wrote:
>> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
>> > index 4b74ea91f4e6..853cafe4a9af 100644
>> > --- a/arch/x86/kvm/svm/avic.c
>> > +++ b/arch/x86/kvm/svm/avic.c
>> > @@ -165,8 +165,10 @@ int avic_ga_log_notifier(u32 ga_tag)
>> > * bit in the vAPIC backing page. So, we just need to schedule
>> > * in the vcpu.
>> > */
>> > - if (vcpu)
>> > + if (vcpu) {
>> > kvm_vcpu_wake_up(vcpu);
>> > + ++vcpu->stat.ga_log_event;
>> > + }
>> >
>>
>> I am not sure why this is added for SVM only.
>
>I am mostly familiar with AVIC, and much less so with VMX's PI, so this is
>why I am likely missing potential stats that could be useful to expose from
>the VMX side. I'll be glad to implement any other suggestions you have.
>
>
>it looks to me GALog events are
>> similar to Intel IOMMU's wakeup events. Can we have a general name? maybe
>> iommu_wakeup_event
>
>I believe that after:
>d588bb9be1da ("KVM: VMX: enable IPI virtualization")
>
>both the VT-d PI and the virtualized IPIs code paths will use POSTED_INTR_WAKEUP_VECTOR
>for interrupts targeting a blocked vCPU. So on Intel hosts enabling IPI virtualization,
>a counter incremented in pi_wakeup_handler() would record interrupts from both virtualized
>IPIs and VT-d sources.
>
>I don't think it is correct to generalize this counter since AMD's implementation is
>different; when a blocked vCPU is targeted:
>
>- by device interrupts, it uses the GA Log mechanism
>- by an IPI, it generates an AVIC_INCOMPLETE_IPI #VMEXIT
>
>If the reasoning above is correct, we can add a VMX specific counter (vmx_pi_wakeup_event?)
>that is increased in pi_wakeup_handler() as you suggest, and document the difference
>in behavior so that is not confused as equivalent with the ga_log_event counter.
Correct. If we cannot generalize the counter, I think it is ok to
add the counter for SVM only. Thank you for the clarification.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 0/3] Export APICv-related state via binary stats interface
2024-02-15 16:01 [RFC 0/3] Export APICv-related state via binary stats interface Alejandro Jimenez
` (3 preceding siblings ...)
2024-04-09 5:09 ` [RFC 0/3] Export APICv-related state via binary stats interface Vasant Hegde
@ 2024-04-16 18:08 ` Sean Christopherson
2024-04-16 19:51 ` Paolo Bonzini
4 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2024-04-16 18:08 UTC (permalink / raw)
To: Alejandro Jimenez
Cc: kvm, pbonzini, linux-kernel, joao.m.martins, boris.ostrovsky,
mark.kanda, suravee.suthikulpanit, mlevitsk
On Thu, Feb 15, 2024, Alejandro Jimenez wrote:
> The goal of this RFC is to agree on a mechanism for querying the state (and
> related stats) of APICv/AVIC. I clearly have an AVIC bias when approaching this
> topic since that is the side that I have mostly looked at, and has the greater
> number of possible inhibits, but I believe the argument applies for both
> vendor's technologies.
>
> Currently, a user or monitoring app trying to determine if APICv is actually
> being used needs implementation-specific knowlegde in order to look for specific
> types of #VMEXIT (i.e. AVIC_INCOMPLETE_IPI/AVIC_NOACCEL), checking GALog events
> by watching /proc/interrupts for AMD-Vi*-GA, etc. There are existing tracepoints
> (e.g. kvm_apicv_accept_irq, kvm_avic_ga_log) that make this task easier, but
> tracefs is not viable in some scenarios. Adding kvm debugfs entries has similar
> downsides. Suravee has previously proposed a new IOCTL interface[0] to expose
> this information, but there has not been any development in that direction.
> Sean has mentioned a preference for using BPF to extract info from the current
> tracepoints, which would require reworking existing structs to access some
> desired data, but as far as I know there isn't any work done on that approach
> yet.
>
> Recently Joao mentioned another alternative: the binary stats framework that is
> already supported by kernel[1] and QEMU[2].
The hiccup with stats are that they are ABI, e.g. we can't (easily) ditch stats
once they're added, and KVM needs to maintain the exact behavior.
Tracepoints are explicitly not ABI, and so we can be much more permissive when it
comes to adding/expanding tracepoints, specifically because there are no guarantees
provided to userspace.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 1/3] x86: KVM: stats: Add a stat to report status of APICv inhibition
2024-02-15 16:01 ` [RFC 1/3] x86: KVM: stats: Add a stat to report status of APICv inhibition Alejandro Jimenez
2024-04-09 14:54 ` Suthikulpanit, Suravee
@ 2024-04-16 18:19 ` Sean Christopherson
2024-04-16 19:53 ` Paolo Bonzini
2024-04-16 19:59 ` Alejandro Jimenez
1 sibling, 2 replies; 25+ messages in thread
From: Sean Christopherson @ 2024-04-16 18:19 UTC (permalink / raw)
To: Alejandro Jimenez
Cc: kvm, pbonzini, linux-kernel, joao.m.martins, boris.ostrovsky,
mark.kanda, suravee.suthikulpanit, mlevitsk
On Thu, Feb 15, 2024, Alejandro Jimenez wrote:
> The inhibition status of APICv can currently be checked using the
> 'kvm_apicv_inhibit_changed' tracepoint, but this is not accessible if
> tracefs is not available (e.g. kernel lockdown, non-root user). Export
> inhibition status as a binary stat that can be monitored from userspace
> without elevated privileges.
>
> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/x86.c | 10 +++++++++-
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ad5319a503f0..9b960a523715 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1524,6 +1524,7 @@ struct kvm_vm_stat {
> u64 nx_lpage_splits;
> u64 max_mmu_page_hash_collisions;
> u64 max_mmu_rmap_size;
> + u64 apicv_inhibited;
Tracking the negative is odd, i.e. if we add a stat, KVM should probably track
if APICv is fully enabled, not if it's inhibited.
This also should be a boolean, not a u64. Precisely enumerating _why_ APICv is
inhibited is firmly in debug territory, i.e. not in scope for "official" stats.
Oh, and this should be a per-vCPU stat, not a VM-wide stat.
As for whether or not we should add a stat for this, I'm leaning towards "yes".
APICv can have such a profound impact on performance (and functionality) that
definitively knowing that it's enabled seems justified.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 2/3] x86: KVM: stats: Add stat counter for IRQs injected via APICv
2024-02-15 18:12 ` Alejandro Jimenez
@ 2024-04-16 18:25 ` Sean Christopherson
0 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2024-04-16 18:25 UTC (permalink / raw)
To: Alejandro Jimenez
Cc: Dongli Zhang, kvm, pbonzini, linux-kernel, joao.m.martins,
boris.ostrovsky, mark.kanda, suravee.suthikulpanit, mlevitsk
On Thu, Feb 15, 2024, Alejandro Jimenez wrote:
> Hi Dongli
>
> On 2/15/24 11:16, Dongli Zhang wrote:
> > Hi Alejandro,
> >
> > Is there any use case of this counter in the bug?
>
> I don't have a specific bug in mind that this is trying to address. This
> patch is just an example is to show how existing data points (i.e. the
> trace_kvm_apicv_accept_irq tracepoint) can also be exposed via the stats
> framework with minimal overhead, and to support the point in the cover letter
> that querying the binary stats could be the best choice for a "single source"
> that tells us the full status of APICv/AVIC (i.e. is SVM and IOMMU AVIC both
> working, are there any inhibits set, etc)
Yeah, but as noted in my response to the cover letter, stats are ABI, whereas
tracepoints are not, i.e. the bar for adding stats is much higher than the bar
for adding tracepoints.
In other words, stats need to come with a concrete use case (preferably more than
one), an explanation of why userspace needs a KVM-provided stat, and a decent
level of confidence that KVM can provide deterministic, sane, and broadly useful
data.
E.g. this proposed stat is of limited usefulness because it applies to a very
narrow combination of IRQs and hardware.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 3/3] x86: KVM: stats: Add a stat counter for GALog events
2024-04-12 10:45 ` Chao Gao
@ 2024-04-16 18:35 ` Sean Christopherson
2024-04-24 0:50 ` Alejandro Jimenez
0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2024-04-16 18:35 UTC (permalink / raw)
To: Chao Gao
Cc: Alejandro Jimenez, kvm, pbonzini, linux-kernel, joao.m.martins,
boris.ostrovsky, mark.kanda, suravee.suthikulpanit, mlevitsk
On Fri, Apr 12, 2024, Chao Gao wrote:
> On Tue, Apr 09, 2024 at 09:31:45PM -0400, Alejandro Jimenez wrote:
> >
> >On 4/9/24 02:45, Chao Gao wrote:
> >> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> >> > index 4b74ea91f4e6..853cafe4a9af 100644
> >> > --- a/arch/x86/kvm/svm/avic.c
> >> > +++ b/arch/x86/kvm/svm/avic.c
> >> > @@ -165,8 +165,10 @@ int avic_ga_log_notifier(u32 ga_tag)
> >> > * bit in the vAPIC backing page. So, we just need to schedule
> >> > * in the vcpu.
> >> > */
> >> > - if (vcpu)
> >> > + if (vcpu) {
> >> > kvm_vcpu_wake_up(vcpu);
> >> > + ++vcpu->stat.ga_log_event;
> >> > + }
> >> >
> >>
> >> I am not sure why this is added for SVM only.
> >
> >I am mostly familiar with AVIC, and much less so with VMX's PI, so this is
> >why I am likely missing potential stats that could be useful to expose from
> >the VMX side. I'll be glad to implement any other suggestions you have.
> >
> >
> >it looks to me GALog events are
> >> similar to Intel IOMMU's wakeup events. Can we have a general name? maybe
> >> iommu_wakeup_event
> >
> >I believe that after:
> >d588bb9be1da ("KVM: VMX: enable IPI virtualization")
> >
> >both the VT-d PI and the virtualized IPIs code paths will use POSTED_INTR_WAKEUP_VECTOR
> >for interrupts targeting a blocked vCPU. So on Intel hosts enabling IPI virtualization,
> >a counter incremented in pi_wakeup_handler() would record interrupts from both virtualized
> >IPIs and VT-d sources.
> >
> >I don't think it is correct to generalize this counter since AMD's implementation is
> >different; when a blocked vCPU is targeted:
> >
> >- by device interrupts, it uses the GA Log mechanism
> >- by an IPI, it generates an AVIC_INCOMPLETE_IPI #VMEXIT
> >
> >If the reasoning above is correct, we can add a VMX specific counter (vmx_pi_wakeup_event?)
> >that is increased in pi_wakeup_handler() as you suggest, and document the difference
> >in behavior so that is not confused as equivalent with the ga_log_event counter.
>
> Correct. If we cannot generalize the counter, I think it is ok to
> add the counter for SVM only. Thank you for the clarification.
There's already a generic stat, halt_wakeup, that more or less covers this case.
And despite what the comment says, avic_ga_log_notifier() does NOT schedule in
the task, kvm_vcpu_wake_up() only wakes up blocking vCPUs, no more, no less.
I'm also not at all convinced that KVM needs to differentiate between IPIs and
device interrupts that arrive when the vCPU isn't in the guest. E.g. this can
kinda sorta be used to confirm IRQ affinity, but if the vCPU is happily running
in the guest, such a heuristic will get false negatives.
And for confirming that GA logging is working, that's more or less covered by the
proposed APICv stat. If AVIC is enabled, the VM has assigned devices, and GA logging
*isn't* working, then you'll probably find out quite quickly because the VM will
have a lot of missed interrupts, e.g. vCPUs will get stuck in HLT.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 0/3] Export APICv-related state via binary stats interface
2024-04-16 18:08 ` Sean Christopherson
@ 2024-04-16 19:51 ` Paolo Bonzini
2024-04-16 21:29 ` Alejandro Jimenez
2024-04-16 22:34 ` Sean Christopherson
0 siblings, 2 replies; 25+ messages in thread
From: Paolo Bonzini @ 2024-04-16 19:51 UTC (permalink / raw)
To: Sean Christopherson
Cc: Alejandro Jimenez, kvm, linux-kernel, joao.m.martins,
boris.ostrovsky, mark.kanda, suravee.suthikulpanit, mlevitsk
On Tue, Apr 16, 2024 at 8:08 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Feb 15, 2024, Alejandro Jimenez wrote:
> > The goal of this RFC is to agree on a mechanism for querying the state (and
> > related stats) of APICv/AVIC. I clearly have an AVIC bias when approaching this
> > topic since that is the side that I have mostly looked at, and has the greater
> > number of possible inhibits, but I believe the argument applies for both
> > vendor's technologies.
> >
> > Currently, a user or monitoring app trying to determine if APICv is actually
> > being used needs implementation-specific knowlegde in order to look for specific
> > types of #VMEXIT (i.e. AVIC_INCOMPLETE_IPI/AVIC_NOACCEL), checking GALog events
> > by watching /proc/interrupts for AMD-Vi*-GA, etc. There are existing tracepoints
> > (e.g. kvm_apicv_accept_irq, kvm_avic_ga_log) that make this task easier, but
> > tracefs is not viable in some scenarios. Adding kvm debugfs entries has similar
> > downsides. Suravee has previously proposed a new IOCTL interface[0] to expose
> > this information, but there has not been any development in that direction.
> > Sean has mentioned a preference for using BPF to extract info from the current
> > tracepoints, which would require reworking existing structs to access some
> > desired data, but as far as I know there isn't any work done on that approach
> > yet.
> >
> > Recently Joao mentioned another alternative: the binary stats framework that is
> > already supported by kernel[1] and QEMU[2].
>
> The hiccup with stats are that they are ABI, e.g. we can't (easily) ditch stats
> once they're added, and KVM needs to maintain the exact behavior.
Stats are not ABI---why would they be? They have an established
meaning and it's not a good idea to change it, but it's not an
absolute no-no(*); and removing them is not a problem at all.
For example, if stats were ABI, there would be no need for the
introspection mechanism, you could just use a struct like ethtool
stats (which *are* ABO).
Not everything makes a good stat but, if in doubt and it's cheap
enough to collect it, go ahead and add it. Cheap collection is the
important point, because tracepoints in a hot path can be so expensive
as to slow down the guest substantially, at least in microbenchmarks.
In this case I'm not sure _all_ inhibits makes sense and I certainly
wouldn't want a bitmask, but a generic APICv-enabled stat certainly
makes sense, and perhaps another for a weirdly-configured local APIC.
Paolo
(*) you have to draw a line somewhere. New processor models may
virtualize parts of the CPU in such a way that some stats become
meaningless or just stay at zero. Should KVM not support those
features because it is not possible anymore to introspect the guest
through stat?
> Tracepoints are explicitly not ABI, and so we can be much more permissive when it
> comes to adding/expanding tracepoints, specifically because there are no guarantees
> provided to userspace.
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 1/3] x86: KVM: stats: Add a stat to report status of APICv inhibition
2024-04-16 18:19 ` Sean Christopherson
@ 2024-04-16 19:53 ` Paolo Bonzini
2024-04-16 19:59 ` Alejandro Jimenez
1 sibling, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2024-04-16 19:53 UTC (permalink / raw)
To: Sean Christopherson
Cc: Alejandro Jimenez, kvm, linux-kernel, joao.m.martins,
boris.ostrovsky, mark.kanda, suravee.suthikulpanit, mlevitsk
On Tue, Apr 16, 2024 at 8:19 PM Sean Christopherson <seanjc@google.com> wrote:
>
> > + u64 apicv_inhibited;
>
> Tracking the negative is odd, i.e. if we add a stat, KVM should probably track
> if APICv is fully enabled, not if it's inhibited.
>
> This also should be a boolean, not a u64. Precisely enumerating _why_ APICv is
> inhibited is firmly in debug territory, i.e. not in scope for "official" stats.
It is a boolean, but stats are all u64 in the file and the contents of
the file map the stats struct directly. See for example the existing
'u64 guest_mode".
Paolo
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 1/3] x86: KVM: stats: Add a stat to report status of APICv inhibition
2024-04-16 18:19 ` Sean Christopherson
2024-04-16 19:53 ` Paolo Bonzini
@ 2024-04-16 19:59 ` Alejandro Jimenez
1 sibling, 0 replies; 25+ messages in thread
From: Alejandro Jimenez @ 2024-04-16 19:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, pbonzini, linux-kernel, joao.m.martins, boris.ostrovsky,
mark.kanda, suravee.suthikulpanit, mlevitsk
On 4/16/24 14:19, Sean Christopherson wrote:
> On Thu, Feb 15, 2024, Alejandro Jimenez wrote:
>> The inhibition status of APICv can currently be checked using the
>> 'kvm_apicv_inhibit_changed' tracepoint, but this is not accessible if
>> tracefs is not available (e.g. kernel lockdown, non-root user). Export
>> inhibition status as a binary stat that can be monitored from userspace
>> without elevated privileges.
>>
>> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
>> ---
>> arch/x86/include/asm/kvm_host.h | 1 +
>> arch/x86/kvm/x86.c | 10 +++++++++-
>> 2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index ad5319a503f0..9b960a523715 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1524,6 +1524,7 @@ struct kvm_vm_stat {
>> u64 nx_lpage_splits;
>> u64 max_mmu_page_hash_collisions;
>> u64 max_mmu_rmap_size;
>> + u64 apicv_inhibited;
I was about to send v2 based on the earlier feedback. I think the changes would
partially address your comments, but there are still wrinkles. In short, I ended
up with:
per-vCPU:
apicv_active (bool) --> tracks vcpu apic.apicv_active
per-VM:
apicv_inhibited (u64) --> exposes kvm apicv_inhibit_reasons
>
> Tracking the negative is odd, i.e. if we add a stat, KVM should probably track
> if APICv is fully enabled, not if it's inhibited>
> This also should be a boolean, not a u64. Precisely enumerating _why_ APICv is
> inhibited is firmly in debug territory, i.e. not in scope for "official" stats.
From that perspective I am perhaps stretching the stats official purpose,
by exposing "too much info", showing _why_ APICv is inhibited (i.e. new
VM-wide apicv_inhibited).
It is true that I am approaching this with a "debugging" bias, but _if_ we do
expose a stat related to inhibition state, I don't think it would be overloading
its meaning to encode relevant inhibit reason information on it.
It would be need to be documented, of course, which is something I can do once
we reach an agreement.
>
> Oh, and this should be a per-vCPU stat, not a VM-wide stat.
>
> As for whether or not we should add a stat for this, I'm leaning towards "yes".
> APICv can have such a profound impact on performance (and functionality) that
> definitively knowing that it's enabled seems justified.
The new per-vCPU apicv_active stat fills this role.
I see you don't agree with a separate stat exposing VM-wide inhibits due to scope
and ABI restrictions mentioned here and in the cover letter reply. I follow the
argument, but it also seems like we'd be handicapping this interface by not
providing inhibit reasons along with it, since they are essential in determining
whether or not AVIC in particular is working (again my debugging bias). I am aware
this is a slight overloading (hopefully not abuse) of the stats purpose, and so
it might not be acceptable.
Alejandro
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 0/3] Export APICv-related state via binary stats interface
2024-04-16 19:51 ` Paolo Bonzini
@ 2024-04-16 21:29 ` Alejandro Jimenez
2024-04-16 21:36 ` Paolo Bonzini
2024-04-16 22:34 ` Sean Christopherson
1 sibling, 1 reply; 25+ messages in thread
From: Alejandro Jimenez @ 2024-04-16 21:29 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, joao.m.martins, boris.ostrovsky, mark.kanda,
suravee.suthikulpanit, mlevitsk
On 4/16/24 15:51, Paolo Bonzini wrote:
> On Tue, Apr 16, 2024 at 8:08 PM Sean Christopherson <seanjc@google.com> wrote:
>>
>> On Thu, Feb 15, 2024, Alejandro Jimenez wrote:
>>> The goal of this RFC is to agree on a mechanism for querying the state (and
>>> related stats) of APICv/AVIC. I clearly have an AVIC bias when approaching this
>>> topic since that is the side that I have mostly looked at, and has the greater
>>> number of possible inhibits, but I believe the argument applies for both
>>> vendor's technologies.
>>>
>>> Currently, a user or monitoring app trying to determine if APICv is actually
>>> being used needs implementation-specific knowlegde in order to look for specific
>>> types of #VMEXIT (i.e. AVIC_INCOMPLETE_IPI/AVIC_NOACCEL), checking GALog events
>>> by watching /proc/interrupts for AMD-Vi*-GA, etc. There are existing tracepoints
>>> (e.g. kvm_apicv_accept_irq, kvm_avic_ga_log) that make this task easier, but
>>> tracefs is not viable in some scenarios. Adding kvm debugfs entries has similar
>>> downsides. Suravee has previously proposed a new IOCTL interface[0] to expose
>>> this information, but there has not been any development in that direction.
>>> Sean has mentioned a preference for using BPF to extract info from the current
>>> tracepoints, which would require reworking existing structs to access some
>>> desired data, but as far as I know there isn't any work done on that approach
>>> yet.
>>>
>>> Recently Joao mentioned another alternative: the binary stats framework that is
>>> already supported by kernel[1] and QEMU[2].
>>
>> The hiccup with stats are that they are ABI, e.g. we can't (easily) ditch stats
>> once they're added, and KVM needs to maintain the exact behavior.
>
> Stats are not ABI---why would they be? They have an established
> meaning and it's not a good idea to change it, but it's not an
> absolute no-no(*); and removing them is not a problem at all.
>
> For example, if stats were ABI, there would be no need for the
> introspection mechanism, you could just use a struct like ethtool
> stats (which *are* ABO).
>
> Not everything makes a good stat but, if in doubt and it's cheap
> enough to collect it, go ahead and add it. Cheap collection is the
> important point, because tracepoints in a hot path can be so expensive
> as to slow down the guest substantially, at least in microbenchmarks.
>
> In this case I'm not sure _all_ inhibits makes sense and I certainly
> wouldn't want a bitmask,
I wanted to be able to query enough info via stats (i.e. is APICv active, and if
not, why is it inhibited?) that is exposed via the other interfaces which are not
always available. That unfortunately requires a bit of "overloading" of
the stat as I mentioned earlier, but it remains cheap to collect and expose.
What would be your preferred interface to provide the (complete) APICv state?
but a generic APICv-enabled stat certainly
> makes sense, and perhaps another for a weirdly-configured local APIC.
Can you expand on what you mean by "weirdly-configured"? Do you mean something
like virtual wire mode?
Alejandro
>
> Paolo
>
> (*) you have to draw a line somewhere. New processor models may
> virtualize parts of the CPU in such a way that some stats become
> meaningless or just stay at zero. Should KVM not support those
> features because it is not possible anymore to introspect the guest
> through stat?
>
>> Tracepoints are explicitly not ABI, and so we can be much more permissive when it
>> comes to adding/expanding tracepoints, specifically because there are no guarantees
>> provided to userspace.
>>
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 0/3] Export APICv-related state via binary stats interface
2024-04-16 21:29 ` Alejandro Jimenez
@ 2024-04-16 21:36 ` Paolo Bonzini
0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2024-04-16 21:36 UTC (permalink / raw)
To: Alejandro Jimenez
Cc: Sean Christopherson, kvm, linux-kernel, joao.m.martins,
boris.ostrovsky, mark.kanda, suravee.suthikulpanit, mlevitsk
On Tue, Apr 16, 2024 at 11:29 PM Alejandro Jimenez
<alejandro.j.jimenez@oracle.com> wrote:
>
>
>
> On 4/16/24 15:51, Paolo Bonzini wrote:
> > On Tue, Apr 16, 2024 at 8:08 PM Sean Christopherson <seanjc@google.com> wrote:
> >>
> >> On Thu, Feb 15, 2024, Alejandro Jimenez wrote:
> >>> The goal of this RFC is to agree on a mechanism for querying the state (and
> >>> related stats) of APICv/AVIC. I clearly have an AVIC bias when approaching this
> >>> topic since that is the side that I have mostly looked at, and has the greater
> >>> number of possible inhibits, but I believe the argument applies for both
> >>> vendor's technologies.
> >>>
> >>> Currently, a user or monitoring app trying to determine if APICv is actually
> >>> being used needs implementation-specific knowlegde in order to look for specific
> >>> types of #VMEXIT (i.e. AVIC_INCOMPLETE_IPI/AVIC_NOACCEL), checking GALog events
> >>> by watching /proc/interrupts for AMD-Vi*-GA, etc. There are existing tracepoints
> >>> (e.g. kvm_apicv_accept_irq, kvm_avic_ga_log) that make this task easier, but
> >>> tracefs is not viable in some scenarios. Adding kvm debugfs entries has similar
> >>> downsides. Suravee has previously proposed a new IOCTL interface[0] to expose
> >>> this information, but there has not been any development in that direction.
> >>> Sean has mentioned a preference for using BPF to extract info from the current
> >>> tracepoints, which would require reworking existing structs to access some
> >>> desired data, but as far as I know there isn't any work done on that approach
> >>> yet.
> >>>
> >>> Recently Joao mentioned another alternative: the binary stats framework that is
> >>> already supported by kernel[1] and QEMU[2].
> >>
> >> The hiccup with stats are that they are ABI, e.g. we can't (easily) ditch stats
> >> once they're added, and KVM needs to maintain the exact behavior.
> >
> > Stats are not ABI---why would they be? They have an established
> > meaning and it's not a good idea to change it, but it's not an
> > absolute no-no(*); and removing them is not a problem at all.
> >
> > For example, if stats were ABI, there would be no need for the
> > introspection mechanism, you could just use a struct like ethtool
> > stats (which *are* ABO).
> >
> > Not everything makes a good stat but, if in doubt and it's cheap
> > enough to collect it, go ahead and add it. Cheap collection is the
> > important point, because tracepoints in a hot path can be so expensive
> > as to slow down the guest substantially, at least in microbenchmarks.
> >
> > In this case I'm not sure _all_ inhibits makes sense and I certainly
> > wouldn't want a bitmask,
>
> I wanted to be able to query enough info via stats (i.e. is APICv active, and if
> not, why is it inhibited?) that is exposed via the other interfaces which are not
> always available. That unfortunately requires a bit of "overloading" of
> the stat as I mentioned earlier, but it remains cheap to collect and expose.
>
> What would be your preferred interface to provide the (complete) APICv state?
>
> but a generic APICv-enabled stat certainly
> > makes sense, and perhaps another for a weirdly-configured local APIC.
>
> Can you expand on what you mean by "weirdly-configured"? Do you mean something
> like virtual wire mode?
I mean any of
APICV_INHIBIT_REASON_HYPERV,
APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED,
APICV_INHIBIT_REASON_APIC_ID_MODIFIED,
APICV_INHIBIT_REASON_APIC_BASE_MODIFIED,
APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED,
which in practice are always going to be APICV_INHIBIT_REASON_HYPERV
on 99.99% of the gues
ExtINT is visible via irq_window_exits; if you are not running nested,
do not have the problematic configurations above. don't have debugging
(BLOCKIRQ) or in-kernel PIT with reinjection, that's basically the
only one that's left.
Paolo
> Alejandro
>
> >
> > Paolo
> >
> > (*) you have to draw a line somewhere. New processor models may
> > virtualize parts of the CPU in such a way that some stats become
> > meaningless or just stay at zero. Should KVM not support those
> > features because it is not possible anymore to introspect the guest
> > through stat?
> >
> >> Tracepoints are explicitly not ABI, and so we can be much more permissive when it
> >> comes to adding/expanding tracepoints, specifically because there are no guarantees
> >> provided to userspace.
> >>
> >
> >
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 0/3] Export APICv-related state via binary stats interface
2024-04-16 19:51 ` Paolo Bonzini
2024-04-16 21:29 ` Alejandro Jimenez
@ 2024-04-16 22:34 ` Sean Christopherson
2024-04-17 9:48 ` Paolo Bonzini
1 sibling, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2024-04-16 22:34 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Alejandro Jimenez, kvm, linux-kernel, joao.m.martins,
boris.ostrovsky, mark.kanda, suravee.suthikulpanit, mlevitsk
On Tue, Apr 16, 2024, Paolo Bonzini wrote:
> On Tue, Apr 16, 2024 at 8:08 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Feb 15, 2024, Alejandro Jimenez wrote:
> > > The goal of this RFC is to agree on a mechanism for querying the state (and
> > > related stats) of APICv/AVIC. I clearly have an AVIC bias when approaching this
> > > topic since that is the side that I have mostly looked at, and has the greater
> > > number of possible inhibits, but I believe the argument applies for both
> > > vendor's technologies.
> > >
> > > Currently, a user or monitoring app trying to determine if APICv is actually
> > > being used needs implementation-specific knowlegde in order to look for specific
> > > types of #VMEXIT (i.e. AVIC_INCOMPLETE_IPI/AVIC_NOACCEL), checking GALog events
> > > by watching /proc/interrupts for AMD-Vi*-GA, etc. There are existing tracepoints
> > > (e.g. kvm_apicv_accept_irq, kvm_avic_ga_log) that make this task easier, but
> > > tracefs is not viable in some scenarios. Adding kvm debugfs entries has similar
> > > downsides. Suravee has previously proposed a new IOCTL interface[0] to expose
> > > this information, but there has not been any development in that direction.
> > > Sean has mentioned a preference for using BPF to extract info from the current
> > > tracepoints, which would require reworking existing structs to access some
> > > desired data, but as far as I know there isn't any work done on that approach
> > > yet.
> > >
> > > Recently Joao mentioned another alternative: the binary stats framework that is
> > > already supported by kernel[1] and QEMU[2].
> >
> > The hiccup with stats are that they are ABI, e.g. we can't (easily) ditch stats
> > once they're added, and KVM needs to maintain the exact behavior.
>
> Stats are not ABI---why would they be?
Because they exposed through an ioctl(), and userspace can and will use stats for
functional purposes? Maybe I just had the wrong takeaway from an old thread about
adding a big pile of stats[1], where unfortunately (for me) you weighed in on
whether or not tracepoints are ABI, but not stats.
And because I've have been operating under the assumption that stats are ABI, I've
been guiding people to using stats to make decisions in userspace. E.g. KVM doesn't
currently expose is_guest_mode() in kvm_run, but it is a stat, so it's not hard
to imagine userspace using the stat to make decisions without needing to call back
into KVM[2].
And based on the old discussion[1] I doubt I'm though only one that has this view.
That said, I'm definitely not opposed to stats _not_ being ABI, because that would
give us a ton of flexibility. E.g. we have a non-trivial number of internal stats
that are super useful _for us_, but are rather heavy and might not be desirable
for most environments. If stats aren't considered ABI, then I'm pretty sure we
could land some of the more generally useful ones upstream, but off-by-default
and guarded by a Kconfig. E.g. we have a pile of stats related to mmu_lock that
are very helpful in identifying performance issues, but they aren't things I would
want enabled by default.
But if we do decide stats aren't ABI, then we need to document and disseminate
that information.
[1] https://lore.kernel.org/all/CALzav=cuzT=u6G0TCVZUfEgAKOCKTSCDE8x2v5qc-Gd_NL-pzg@mail.gmail.com
[2] https://lore.kernel.org/all/Zh6-e9hy7U6DD2QM@google.com
> They have an established meaning and it's not a good idea to change it, but
> it's not an absolute no-no(*); and removing them is not a problem at all.
>
> For example, if stats were ABI, there would be no need for the
> introspection mechanism, you could just use a struct like ethtool
> stats (which *are* ABO).
>
> Not everything makes a good stat but, if in doubt and it's cheap
> enough to collect it, go ahead and add it.
Marc raised the (IMO) valid concern that "if it's cheap, add it" will lead to
death by a thousand cuts. E.g. add a few hundred vCPU stats and suddenly vCPUs
consumes an extra KiB or three of memory.
A few APIC stats obviously aren't going to move the needle much, I'm just pointing
out that not everyone agrees that KVM should be hyper permissive when it comes to
adding new stats.
> Cheap collection is the important point, because tracepoints in a hot path
> can be so expensive as to slow down the guest substantially, at least in
> microbenchmarks.
>
> In this case I'm not sure _all_ inhibits makes sense and I certainly
> wouldn't want a bitmask, but a generic APICv-enabled stat certainly
> makes sense, and perhaps another for a weirdly-configured local APIC.
>
> Paolo
>
> (*) you have to draw a line somewhere. New processor models may
> virtualize parts of the CPU in such a way that some stats become
> meaningless or just stay at zero. Should KVM not support those
> features because it is not possible anymore to introspect the guest
> through stat?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 0/3] Export APICv-related state via binary stats interface
2024-04-16 22:34 ` Sean Christopherson
@ 2024-04-17 9:48 ` Paolo Bonzini
2024-04-23 20:43 ` Alejandro Jimenez
0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2024-04-17 9:48 UTC (permalink / raw)
To: Sean Christopherson
Cc: Alejandro Jimenez, kvm, linux-kernel, joao.m.martins,
boris.ostrovsky, mark.kanda, suravee.suthikulpanit, mlevitsk
On Wed, Apr 17, 2024 at 12:35 AM Sean Christopherson <seanjc@google.com> wrote:
> > > The hiccup with stats are that they are ABI, e.g. we can't (easily) ditch stats
> > > once they're added, and KVM needs to maintain the exact behavior.
> >
> > Stats are not ABI---why would they be?
>
> Because they exposed through an ioctl(), and userspace can and will use stats for
> functional purposes? Maybe I just had the wrong takeaway from an old thread about
> adding a big pile of stats[1], where unfortunately (for me) you weighed in on
> whether or not tracepoints are ABI, but not stats.
I think we can agree that:
- you don't want hundreds of stats (Marc's point)
- a large part of the stats are very stable, but just as many (while
useful) depend on details which are very much implementation dependent
- a subset of stats is pretty close to being ABI (e.g. guest_mode),
but others can change meaning depending on processor model, guest
configuration and/or guest type (e.g. all of them affect
interrupt-related stats due to APICv inhibits).
While there are exceptions, the main consumer of stats (but indeed not
the only one) is intended to be the user, not a program. This is the
same as tracepoints, and it's why the introspection bits exist.
(User-friendliness also means that bitmask stats are "ouch"; I guess
we could add support for bit-sized boolean stats is things get out of
control).
For many stats, using them for functional purposes would be
wrong/dumb. You have to be ready for the exact behavior to change even
if the stats remain the same. If userspace doesn't, it's being dumb.
KVM can't be blocked from supporting new features just because they
"break" stats, and shouldn't be blocked from adding useful debugging
stats just because userspace could be dumb.
For example, the point of pf_fast is mostly to compare it with other
pf_* stats and see if there's something smelly going on.pf_fast used
to affect pretty much only dirty bits; nowadays it also affects
accessed bits on !ept_ad machines and it does not affect dirty bits if
you have PML. So in the past it was possible to use pf_fast as a proxy
for the number of dirty pages, for example during migration. That
usage doesn't work anymore. Tough luck.
Perhaps you could use the halt-polling stats to toggle DISABLE_EXITS
for VMs that consume a lot of time polling. That would be a more
plausible use of stats to drive heuristics, but again, the power to
look into low-level details of KVM and guest behavior comes with the
accompanying responsibility. It is _not_ guaranteed to keep working in
the same way as processors come out and optimizations are added to
KVM.
So you would have to look at intentional stats breakages one by one,
and this is again a lot like tracepoints. And many potential breakages
would go unnoticed anyway, because there's an infinite supply of bad
ideas when it comes to stats and heuristics.
> That said, I'm definitely not opposed to stats _not_ being ABI, because that would
> give us a ton of flexibility. E.g. we have a non-trivial number of internal stats
> that are super useful _for us_, but are rather heavy and might not be desirable
> for most environments. If stats aren't considered ABI, then I'm pretty sure we
> could land some of the more generally useful ones upstream, but off-by-default
> and guarded by a Kconfig. E.g. we have a pile of stats related to mmu_lock that
> are very helpful in identifying performance issues, but they aren't things I would
> want enabled by default.
That would be great indeed.
> > Not everything makes a good stat but, if in doubt and it's cheap
> > enough to collect it, go ahead and add it.
>
> Marc raised the (IMO) valid concern that "if it's cheap, add it" will lead to
> death by a thousand cuts. E.g. add a few hundred vCPU stats and suddenly vCPUs
> consumes an extra KiB or three of memory.
>
> A few APIC stats obviously aren't going to move the needle much, I'm just pointing
> out that not everyone agrees that KVM should be hyper permissive when it comes to
> adding new stats.
Yeah, that's why I made it conditional to "if in doubt". "Stats are
not ABI" is not a free pass to add anything, also because the truth is
closer than "Stats are generally not ABI but keeping them stable is a
good idea". Many more stats are obviously bad to have upstream, than
there are good ones; and when adding stats it makes sense to consider
their stability but without making it an absolute criterion for
inclusion.
So for this patch, I would weigh advantages to be substantial:
+ APICv inhibits at this point are relatively stable
+ the performance impact is large enough that APICv/AVIC stats _can_
be useful, both boolean and cumulative ones; so for example I'd add an
interrupt_injections stat for unaccelerated injections causing a
vmexit or otherwise hitting lapic.c.
But absolutely would not go with a raw bitmask because:
- the exact set of inhibits is subject to change
- super high detail into niche APICv inhibits is unlikely to be useful
- many if not most inhibits are trivially derived from VM configuration
Paolo
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 0/3] Export APICv-related state via binary stats interface
2024-04-17 9:48 ` Paolo Bonzini
@ 2024-04-23 20:43 ` Alejandro Jimenez
0 siblings, 0 replies; 25+ messages in thread
From: Alejandro Jimenez @ 2024-04-23 20:43 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, joao.m.martins, boris.ostrovsky, mark.kanda,
suravee.suthikulpanit, mlevitsk, Chao Gao, Dongli Zhang,
Vasant Hegde
On 4/17/24 05:48, Paolo Bonzini wrote:
> On Wed, Apr 17, 2024 at 12:35 AM Sean Christopherson <seanjc@google.com> wrote:
>>>> The hiccup with stats are that they are ABI, e.g. we can't (easily) ditch stats
>>>> once they're added, and KVM needs to maintain the exact behavior.
>>>
>>> Stats are not ABI---why would they be?
>>
>> Because they exposed through an ioctl(), and userspace can and will use stats for
>> functional purposes? Maybe I just had the wrong takeaway from an old thread about
>> adding a big pile of stats[1], where unfortunately (for me) you weighed in on
>> whether or not tracepoints are ABI, but not stats.
>
> I think we can agree that:
> - you don't want hundreds of stats (Marc's point)
> - a large part of the stats are very stable, but just as many (while
> useful) depend on details which are very much implementation dependent
> - a subset of stats is pretty close to being ABI (e.g. guest_mode),
> but others can change meaning depending on processor model, guest
> configuration and/or guest type (e.g. all of them affect
> interrupt-related stats due to APICv inhibits).
>
> While there are exceptions, the main consumer of stats (but indeed not
> the only one) is intended to be the user, not a program. This is the
> same as tracepoints, and it's why the introspection bits exist.
> (User-friendliness also means that bitmask stats are "ouch"; I guess
> we could add support for bit-sized boolean stats is things get out of
> control).
>
> For many stats, using them for functional purposes would be
> wrong/dumb. You have to be ready for the exact behavior to change even
> if the stats remain the same. If userspace doesn't, it's being dumb.
> KVM can't be blocked from supporting new features just because they
> "break" stats, and shouldn't be blocked from adding useful debugging
> stats just because userspace could be dumb.
[ trim ]
>
>> That said, I'm definitely not opposed to stats _not_ being ABI, because that would
>> give us a ton of flexibility. E.g. we have a non-trivial number of internal stats
>> that are super useful _for us_, but are rather heavy and might not be desirable
>> for most environments. If stats aren't considered ABI, then I'm pretty sure we
>> could land some of the more generally useful ones upstream, but off-by-default
>> and guarded by a Kconfig. E.g. we have a pile of stats related to mmu_lock that
>> are very helpful in identifying performance issues, but they aren't things I would
>> want enabled by default.
>
> That would be great indeed.
>
>>> Not everything makes a good stat but, if in doubt and it's cheap
>>> enough to collect it, go ahead and add it.
>>
>> Marc raised the (IMO) valid concern that "if it's cheap, add it" will lead to
>> death by a thousand cuts. E.g. add a few hundred vCPU stats and suddenly vCPUs
>> consumes an extra KiB or three of memory.
>>
>> A few APIC stats obviously aren't going to move the needle much, I'm just pointing
>> out that not everyone agrees that KVM should be hyper permissive when it comes to
>> adding new stats.
>
> Yeah, that's why I made it conditional to "if in doubt". "Stats are
> not ABI" is not a free pass to add anything, also because the truth is
> closer than "Stats are generally not ABI but keeping them stable is a
> good idea". Many more stats are obviously bad to have upstream, than
> there are good ones; and when adding stats it makes sense to consider
> their stability but without making it an absolute criterion for
> inclusion.
>
> So for this patch, I would weigh advantages to be substantial:
> + APICv inhibits at this point are relatively stable
> + the performance impact is large enough that APICv/AVIC stats _can_
> be useful, both boolean and cumulative ones; so for example I'd add an
> interrupt_injections stat for unaccelerated injections causing a
> vmexit or otherwise hitting lapic.c.
So far it seems there is support for using the stats interface to expose:
- APICv status: (apicv_enabled, boolean, per-vCPU)
- Windows guest using SynIC's AutoEOI: (synic_auto_eoi_used, boolean, per-VM)
- KVM PIT in reinject mode inhibits AVIC: (pit_reinject_mode, boolean, per-VM)
- APICv unaccelerated injections causing a vmexit (i.e. AVIC_INCOMPLETE_IPI,
AVIC_UNACCELERATED_ACCESS, APIC_WRITE): (apicv_unaccelerated_inj, counter,
per-vCPU)
You'll noticed that I framed the AutoEOI usage and PIT reinject mode as their
own stats, as opposed to linking them to APICv inhibits, so it requires a
certain level of knowledge about the APICv limitations to make the connection.
Is this the preferred approach or would we want to associate them more
explicitly to APICv inhibitions?
Alternatively...
>
> But absolutely would not go with a raw bitmask because:
> - the exact set of inhibits is subject to change
> - super high detail into niche APICv inhibits is unlikely to be useful
> - many if not most inhibits are trivially derived from VM configuration
I wasn't able to fully make my case during the last PUCK meeting, so I'll try
here again. I'd like to insist on the opportunity to provide visibility into
APICv state with minimal change by exposing the current inhibit reasons. I am
aware that Sean and Paolo have expressed opposition to it, so I'll only insist
one more time:
Putting aside valid concerns about setting a precedent for exposing a bitmask
via stats, I'd argue in this case the apicv_inhibit_reasons is essentially a
tri-state variable. i.e. we can have a per-vCPU stat with possible values:
0 --> APICv is enabled and active
1 --> APICv is disabled (either by module parameter or lack of HW support)
>1 --> APICv is disabled due to other reason(s)
The >1 values must be decoded using the kvm_host.h header, same as we are forced
to do for tracepoints until (shameless plug):
https://lore.kernel.org/all/20240214223554.1033154-1-alejandro.j.jimenez@oracle.com/
is hopefully merged.
So as long as the method for decoding is documented in KVM docs (which would I
add as a patch in the series), a single stat can fully expose the APICv state
and be useful for debugging too. It could replace three of the stats proposed:
apicv_enabled, synic_auto_eoi_used, pit_reinject_mode.
Cons:
- The exact set of inhibits is subject ot change / detail into niche APICv
inhibits is unlikely to be useful.
The state described by 0 and 1 values of the stat are unlikely to change, and it
costs nothing to provide additional meaning in the remaining bits. The fact that
inhibits are likely to change/grow makes this approach better, since we avoid
creating stats to describe scenarios that become obsolete e.g. Synic AutoEOI is
fully deprecated, or split irqchip becomes the default..
- many if not most inhibits are trivially derived from VM configuration
This assumes non-trivial and implementation specific knowledge about VMM/guest
defaults, and limitations of the AVIC/APICv host hardware, which have also been
known to change e.g. Milan (unofficially) supports AVIC, but not x2AVIC.
Some inhibits will be set by default currently e.g. QEMU uses in-kernel irqchip
by default since pc-q35-4.0.1, and KVM defaults to reinject mode when creating a
PIT.
ExtINT can be inferred via IRQ window exits which are rare, but on the reverse,
noticing that AVIC is inhibited due to an IRQ window request can draw attention
to problems in that unlikely code path, as I recently found while debugging an
issue on OVMF + Secure Boot. (anecdotal)
VMMs other than QEMU and/or guests other than Linux can be inhibited due to
"less common" reasons than the ones we see today. (hypothetical)
--
I understand concerns about overloading the stats interface, but seems a lost
opportunity and a handicap to usability if we provide a boolean for APICv,
but no information about the reasons why it was not be enabled, specially
since in some real scenarios (e.g. hosts enabling kernel lockdown in
confidentiality mode, normally paired with Secure Boot) that information will
not be accessible via other mechanisms like tracepoints or BPF.
If the idea of exposing the inhibit reasons is still a NACK, then please let me
know if you agree with the 4 proposed stats mentioned at the beginning and I'll
send a v2 that implements them.
Thank you,
Alejandro
>
> Paolo
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 3/3] x86: KVM: stats: Add a stat counter for GALog events
2024-04-16 18:35 ` Sean Christopherson
@ 2024-04-24 0:50 ` Alejandro Jimenez
0 siblings, 0 replies; 25+ messages in thread
From: Alejandro Jimenez @ 2024-04-24 0:50 UTC (permalink / raw)
To: Sean Christopherson, Chao Gao
Cc: kvm, pbonzini, linux-kernel, joao.m.martins, boris.ostrovsky,
mark.kanda, suravee.suthikulpanit, mlevitsk
On 4/16/24 14:35, Sean Christopherson wrote:
> On Fri, Apr 12, 2024, Chao Gao wrote:
>> On Tue, Apr 09, 2024 at 09:31:45PM -0400, Alejandro Jimenez wrote:
>>>
>>> On 4/9/24 02:45, Chao Gao wrote:
>>>>> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
>>>>> index 4b74ea91f4e6..853cafe4a9af 100644
>>>>> --- a/arch/x86/kvm/svm/avic.c
>>>>> +++ b/arch/x86/kvm/svm/avic.c
>>>>> @@ -165,8 +165,10 @@ int avic_ga_log_notifier(u32 ga_tag)
>>>>> * bit in the vAPIC backing page. So, we just need to schedule
>>>>> * in the vcpu.
>>>>> */
>>>>> - if (vcpu)
>>>>> + if (vcpu) {
>>>>> kvm_vcpu_wake_up(vcpu);
>>>>> + ++vcpu->stat.ga_log_event;
>>>>> + }
>>>>>
>>>>
>>>> I am not sure why this is added for SVM only.
>>>
>>> I am mostly familiar with AVIC, and much less so with VMX's PI, so this is
>>> why I am likely missing potential stats that could be useful to expose from
>>> the VMX side. I'll be glad to implement any other suggestions you have.
>>>
>>>
>>> it looks to me GALog events are
>>>> similar to Intel IOMMU's wakeup events. Can we have a general name? maybe
>>>> iommu_wakeup_event
>>>
>>> I believe that after:
>>> d588bb9be1da ("KVM: VMX: enable IPI virtualization")
>>>
>>> both the VT-d PI and the virtualized IPIs code paths will use POSTED_INTR_WAKEUP_VECTOR
>>> for interrupts targeting a blocked vCPU. So on Intel hosts enabling IPI virtualization,
>>> a counter incremented in pi_wakeup_handler() would record interrupts from both virtualized
>>> IPIs and VT-d sources.
>>>
>>> I don't think it is correct to generalize this counter since AMD's implementation is
>>> different; when a blocked vCPU is targeted:
>>>
>>> - by device interrupts, it uses the GA Log mechanism
>>> - by an IPI, it generates an AVIC_INCOMPLETE_IPI #VMEXIT
>>>
>>> If the reasoning above is correct, we can add a VMX specific counter (vmx_pi_wakeup_event?)
>>> that is increased in pi_wakeup_handler() as you suggest, and document the difference
>>> in behavior so that is not confused as equivalent with the ga_log_event counter.
>>
>> Correct. If we cannot generalize the counter, I think it is ok to
>> add the counter for SVM only. Thank you for the clarification.
>
> There's already a generic stat, halt_wakeup, that more or less covers this case.
I don't think we can extrapolate PI-originated wake ups from halt_wakeup, since it
can/will also be triggered with APICv/AVIC disabled.
> And despite what the comment says, avic_ga_log_notifier() does NOT schedule in
> the task, kvm_vcpu_wake_up() only wakes up blocking vCPUs, no more, no less.
True, both the GA log and the PI wake up handler just call kvm_vcpu_wake_up().
>
> I'm also not at all convinced that KVM needs to differentiate between IPIs and
> device interrupts that arrive when the vCPU isn't in the guest. E.g. this can
> kinda sorta be used to confirm IRQ affinity, but if the vCPU is happily running
> in the guest, such a heuristic will get false negatives.
>
> And for confirming that GA logging is working, that's more or less covered by the
> proposed APICv stat. If AVIC is enabled, the VM has assigned devices, and GA logging
> *isn't* working, then you'll probably find out quite quickly because the VM will
> have a lot of missed interrupts, e.g. vCPUs will get stuck in HLT.
ACK, if the device interrupts are not being handled correctly there will be lots of
complaints during device initialization as we have seen before.
There is one scenario in which you can have APICv/AVIC enabled but only doing the
IPI acceleration, while device interrupts are still using the legacy path.
It requires booting the host kernel with 'amd_iommu_intr=legacy'(AMD) or with
'intremap=nopost'(Intel), so that is a special case since you must explicitly
request the behavior.
In short, I typically use the GA Log tracepoint to confirm IOMMU AVIC is working
as expected, so I wanted to provide the equivalent via the stats.
If we want to have a common stat, we could have a pi_wakeup stat that is incremented
both in ga_log and vmx_pi_wakeup_event, but I do understand that is not strictly
necessary, specially if we want to be conservative with the number of stats.
Thank you,
Alejandro
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2024-04-24 0:51 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-15 16:01 [RFC 0/3] Export APICv-related state via binary stats interface Alejandro Jimenez
2024-02-15 16:01 ` [RFC 1/3] x86: KVM: stats: Add a stat to report status of APICv inhibition Alejandro Jimenez
2024-04-09 14:54 ` Suthikulpanit, Suravee
2024-04-16 18:19 ` Sean Christopherson
2024-04-16 19:53 ` Paolo Bonzini
2024-04-16 19:59 ` Alejandro Jimenez
2024-02-15 16:01 ` [RFC 2/3] x86: KVM: stats: Add stat counter for IRQs injected via APICv Alejandro Jimenez
2024-02-15 16:16 ` Dongli Zhang
2024-02-15 18:12 ` Alejandro Jimenez
2024-04-16 18:25 ` Sean Christopherson
2024-02-15 16:01 ` [RFC 3/3] x86: KVM: stats: Add a stat counter for GALog events Alejandro Jimenez
2024-04-09 6:45 ` Chao Gao
2024-04-10 1:31 ` Alejandro Jimenez
2024-04-12 10:45 ` Chao Gao
2024-04-16 18:35 ` Sean Christopherson
2024-04-24 0:50 ` Alejandro Jimenez
2024-04-09 5:09 ` [RFC 0/3] Export APICv-related state via binary stats interface Vasant Hegde
2024-04-10 1:36 ` Alejandro Jimenez
2024-04-16 18:08 ` Sean Christopherson
2024-04-16 19:51 ` Paolo Bonzini
2024-04-16 21:29 ` Alejandro Jimenez
2024-04-16 21:36 ` Paolo Bonzini
2024-04-16 22:34 ` Sean Christopherson
2024-04-17 9:48 ` Paolo Bonzini
2024-04-23 20:43 ` Alejandro Jimenez
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox