* [PATCH 0/5] AVIC bugfixes and workarounds
@ 2023-09-28 15:04 Maxim Levitsky
2023-09-28 15:04 ` [PATCH 1/5] x86: KVM: SVM: fix for x2avic CVE-2023-5090 Maxim Levitsky
` (5 more replies)
0 siblings, 6 replies; 14+ messages in thread
From: Maxim Levitsky @ 2023-09-28 15:04 UTC (permalink / raw)
To: kvm
Cc: Will Deacon, Borislav Petkov, Dave Hansen, Suravee Suthikulpanit,
Thomas Gleixner, Paolo Bonzini, x86, Robin Murphy, iommu,
Ingo Molnar, Joerg Roedel, Sean Christopherson, H. Peter Anvin,
linux-kernel, Maxim Levitsky
Hi!
This patch series includes several fixes to AVIC I found while working
on a new version of nested AVIC code.
Also while developing it I realized that a very simple workaround for
AVIC's errata #1235 exists and included it in this patch series as well.
Best regards,
Maxim Levitsky
Maxim Levitsky (5):
x86: KVM: SVM: fix for x2avic CVE-2023-5090
x86: KVM: SVM: add support for Invalid IPI Vector interception
x86: KVM: SVM: refresh AVIC inhibition in svm_leave_nested()
iommu/amd: skip updating the IRTE entry when is_run is already false
x86: KVM: SVM: workaround for AVIC's errata #1235
arch/x86/include/asm/svm.h | 1 +
arch/x86/kvm/svm/avic.c | 55 +++++++++++++++++++++++++++-----------
arch/x86/kvm/svm/nested.c | 3 +++
arch/x86/kvm/svm/svm.c | 3 +--
drivers/iommu/amd/iommu.c | 9 +++++++
5 files changed, 54 insertions(+), 17 deletions(-)
--
2.26.3
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/5] x86: KVM: SVM: fix for x2avic CVE-2023-5090
2023-09-28 15:04 [PATCH 0/5] AVIC bugfixes and workarounds Maxim Levitsky
@ 2023-09-28 15:04 ` Maxim Levitsky
2023-09-28 15:53 ` Sean Christopherson
2023-09-28 15:04 ` [PATCH 2/5] x86: KVM: SVM: add support for Invalid IPI Vector interception Maxim Levitsky
` (4 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Maxim Levitsky @ 2023-09-28 15:04 UTC (permalink / raw)
To: kvm
Cc: Will Deacon, Borislav Petkov, Dave Hansen, Suravee Suthikulpanit,
Thomas Gleixner, Paolo Bonzini, x86, Robin Murphy, iommu,
Ingo Molnar, Joerg Roedel, Sean Christopherson, H. Peter Anvin,
linux-kernel, Maxim Levitsky, stable
The following problem exists since the x2avic was enabled in the KVM:
svm_set_x2apic_msr_interception is called to enable the interception of
the x2apic msrs.
In particular it is called at the moment the guest resets its apic.
Assuming that the guest's apic was in x2apic mode, the reset will bring
it back to the xapic mode.
The svm_set_x2apic_msr_interception however has an erroneous check for
'!apic_x2apic_mode()' which prevents it from doing anything in this case.
As a result of this, all x2apic msrs are left unintercepted, and that
exposes the bare metal x2apic (if enabled) to the guest.
Oops.
Remove the erroneous '!apic_x2apic_mode()' check to fix that.
Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
arch/x86/kvm/svm/svm.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9507df93f410a63..acdd0b89e4715a3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -913,8 +913,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
if (intercept == svm->x2avic_msrs_intercepted)
return;
- if (!x2avic_enabled ||
- !apic_x2apic_mode(svm->vcpu.arch.apic))
+ if (!x2avic_enabled)
return;
for (i = 0; i < MAX_DIRECT_ACCESS_MSRS; i++) {
--
2.26.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/5] x86: KVM: SVM: add support for Invalid IPI Vector interception
2023-09-28 15:04 [PATCH 0/5] AVIC bugfixes and workarounds Maxim Levitsky
2023-09-28 15:04 ` [PATCH 1/5] x86: KVM: SVM: fix for x2avic CVE-2023-5090 Maxim Levitsky
@ 2023-09-28 15:04 ` Maxim Levitsky
2023-09-28 15:46 ` Sean Christopherson
2023-09-28 15:04 ` [PATCH 3/5] x86: KVM: SVM: refresh AVIC inhibition in svm_leave_nested() Maxim Levitsky
` (3 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Maxim Levitsky @ 2023-09-28 15:04 UTC (permalink / raw)
To: kvm
Cc: Will Deacon, Borislav Petkov, Dave Hansen, Suravee Suthikulpanit,
Thomas Gleixner, Paolo Bonzini, x86, Robin Murphy, iommu,
Ingo Molnar, Joerg Roedel, Sean Christopherson, H. Peter Anvin,
linux-kernel, Maxim Levitsky, stable
In later revisions of AMD's APM, there is a new 'incomplete IPI' exit code:
"Invalid IPI Vector - The vector for the specified IPI was set to an
illegal value (VEC < 16)"
Note that tests on Zen2 machine show that this VM exit doesn't happen and
instead AVIC just does nothing.
Add support for this exit code by doing nothing, instead of filling
the kernel log with errors.
Also replace an unthrottled 'pr_err()' if another unknown incomplete
IPI exit happens with WARN_ON_ONCE()
(e.g in case AMD adds yet another 'Invalid IPI' exit reason)
Cc: <stable@vger.kernel.org>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
arch/x86/include/asm/svm.h | 1 +
arch/x86/kvm/svm/avic.c | 5 ++++-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 19bf955b67e0da0..3ac0ffc4f3e202b 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -268,6 +268,7 @@ enum avic_ipi_failure_cause {
AVIC_IPI_FAILURE_TARGET_NOT_RUNNING,
AVIC_IPI_FAILURE_INVALID_TARGET,
AVIC_IPI_FAILURE_INVALID_BACKING_PAGE,
+ AVIC_IPI_FAILURE_INVALID_IPI_VECTOR,
};
#define AVIC_PHYSICAL_MAX_INDEX_MASK GENMASK_ULL(8, 0)
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 2092db892d7d052..c44b65af494e3ff 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -529,8 +529,11 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu)
case AVIC_IPI_FAILURE_INVALID_BACKING_PAGE:
WARN_ONCE(1, "Invalid backing page\n");
break;
+ case AVIC_IPI_FAILURE_INVALID_IPI_VECTOR:
+ /* Invalid IPI with vector < 16 */
+ break;
default:
- pr_err("Unknown IPI interception\n");
+ WARN_ONCE(1, "Unknown avic incomplete IPI interception\n");
}
return 1;
--
2.26.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/5] x86: KVM: SVM: refresh AVIC inhibition in svm_leave_nested()
2023-09-28 15:04 [PATCH 0/5] AVIC bugfixes and workarounds Maxim Levitsky
2023-09-28 15:04 ` [PATCH 1/5] x86: KVM: SVM: fix for x2avic CVE-2023-5090 Maxim Levitsky
2023-09-28 15:04 ` [PATCH 2/5] x86: KVM: SVM: add support for Invalid IPI Vector interception Maxim Levitsky
@ 2023-09-28 15:04 ` Maxim Levitsky
2023-09-28 16:03 ` Sean Christopherson
2023-09-28 15:04 ` [PATCH 4/5] iommu/amd: skip updating the IRTE entry when is_run is already false Maxim Levitsky
` (2 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Maxim Levitsky @ 2023-09-28 15:04 UTC (permalink / raw)
To: kvm
Cc: Will Deacon, Borislav Petkov, Dave Hansen, Suravee Suthikulpanit,
Thomas Gleixner, Paolo Bonzini, x86, Robin Murphy, iommu,
Ingo Molnar, Joerg Roedel, Sean Christopherson, H. Peter Anvin,
linux-kernel, Maxim Levitsky, stable
svm_leave_nested() similar to a nested VM exit, get the vCPU out of nested
mode and thus should end the local inhibition of AVIC on this vCPU.
Failure to do so, can lead to hangs on guest reboot.
Raise the KVM_REQ_APICV_UPDATE request to refresh the AVIC state of the
current vCPU in this case.
Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
arch/x86/kvm/svm/nested.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index dd496c9e5f91f28..3fea8c47679e689 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1253,6 +1253,9 @@ void svm_leave_nested(struct kvm_vcpu *vcpu)
nested_svm_uninit_mmu_context(vcpu);
vmcb_mark_all_dirty(svm->vmcb);
+
+ if (kvm_apicv_activated(vcpu->kvm))
+ kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
}
kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
--
2.26.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/5] iommu/amd: skip updating the IRTE entry when is_run is already false
2023-09-28 15:04 [PATCH 0/5] AVIC bugfixes and workarounds Maxim Levitsky
` (2 preceding siblings ...)
2023-09-28 15:04 ` [PATCH 3/5] x86: KVM: SVM: refresh AVIC inhibition in svm_leave_nested() Maxim Levitsky
@ 2023-09-28 15:04 ` Maxim Levitsky
2023-09-28 17:27 ` Joao Martins
2023-09-28 15:04 ` [PATCH 5/5] x86: KVM: SVM: workaround for AVIC's errata #1235 Maxim Levitsky
2024-03-26 3:15 ` [PATCH 0/5] AVIC bugfixes and workarounds Jim Mattson
5 siblings, 1 reply; 14+ messages in thread
From: Maxim Levitsky @ 2023-09-28 15:04 UTC (permalink / raw)
To: kvm
Cc: Will Deacon, Borislav Petkov, Dave Hansen, Suravee Suthikulpanit,
Thomas Gleixner, Paolo Bonzini, x86, Robin Murphy, iommu,
Ingo Molnar, Joerg Roedel, Sean Christopherson, H. Peter Anvin,
linux-kernel, Maxim Levitsky
When vCPU affinity of an IRTE which already has
is_run == false, is updated and the update also sets is_run to false,
there is nothing to do.
The goal of this patch is to make a call to 'amd_iommu_update_ga()'
to be relatively cheap if there is nothing to do.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
drivers/iommu/amd/iommu.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 95bd7c25ba6f366..10bcd436e984672 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3774,6 +3774,15 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data)
entry->hi.fields.destination =
APICID_TO_IRTE_DEST_HI(cpu);
}
+
+ if (!is_run && !entry->lo.fields_vapic.is_run) {
+ /*
+ * No need to notify the IOMMU about an entry which
+ * already has is_run == False
+ */
+ return 0;
+ }
+
entry->lo.fields_vapic.is_run = is_run;
return modify_irte_ga(ir_data->iommu, ir_data->irq_2_irte.devid,
--
2.26.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 5/5] x86: KVM: SVM: workaround for AVIC's errata #1235
2023-09-28 15:04 [PATCH 0/5] AVIC bugfixes and workarounds Maxim Levitsky
` (3 preceding siblings ...)
2023-09-28 15:04 ` [PATCH 4/5] iommu/amd: skip updating the IRTE entry when is_run is already false Maxim Levitsky
@ 2023-09-28 15:04 ` Maxim Levitsky
2023-09-28 15:37 ` Sean Christopherson
2024-03-26 3:15 ` [PATCH 0/5] AVIC bugfixes and workarounds Jim Mattson
5 siblings, 1 reply; 14+ messages in thread
From: Maxim Levitsky @ 2023-09-28 15:04 UTC (permalink / raw)
To: kvm
Cc: Will Deacon, Borislav Petkov, Dave Hansen, Suravee Suthikulpanit,
Thomas Gleixner, Paolo Bonzini, x86, Robin Murphy, iommu,
Ingo Molnar, Joerg Roedel, Sean Christopherson, H. Peter Anvin,
linux-kernel, Maxim Levitsky
On Zen2 (and likely on Zen1 as well), AVIC doesn't reliably detect a change
in the 'is_running' bit during ICR write emulation and might skip a
VM exit, if that bit was recently cleared.
The absence of the VM exit, leads to the KVM not waking up / triggering
nested vm exit on the target(s) of the IPI which can, in some cases,
lead to an unbounded delays in the guest execution.
As I recently discovered, a reasonable workaround exists: make the KVM
never set the is_running bit.
This workaround ensures that (*) all ICR writes always cause a VM exit
and therefore correctly emulated, in expense of never enjoying VM exit-less
ICR emulation.
This workaround does carry a performance penalty but according to my
benchmarks is still much better than not using AVIC at all,
because AVIC is still used for the receiving end of the IPIs, and for the
posted interrupts.
If the user is aware of the errata and it doesn't affect his workload,
the user can disable the workaround with 'avic_zen2_errata_workaround=0'
(*) More correctly all ICR writes except when 'Self' shorthand is used:
In this case AVIC skips reading physid table and just sets bits in IRR
of local APIC. Thankfully in this case, the errata is not possible,
therefore an extra workaround for this case is not needed.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
arch/x86/kvm/svm/avic.c | 50 +++++++++++++++++++++++++++++------------
1 file changed, 36 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index c44b65af494e3ff..df9efa428f86aa9 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -62,6 +62,9 @@ static_assert(__AVIC_GATAG(AVIC_VM_ID_MASK, AVIC_VCPU_ID_MASK) == -1u);
static bool force_avic;
module_param_unsafe(force_avic, bool, 0444);
+static int avic_zen2_errata_workaround = -1;
+module_param(avic_zen2_errata_workaround, int, 0444);
+
/* Note:
* This hash table is used to map VM_ID to a struct kvm_svm,
* when handling AMD IOMMU GALOG notification to schedule in
@@ -1027,7 +1030,6 @@ avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r)
void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
- u64 entry;
int h_physical_id = kvm_cpu_get_apicid(cpu);
struct vcpu_svm *svm = to_svm(vcpu);
unsigned long flags;
@@ -1056,14 +1058,18 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
*/
spin_lock_irqsave(&svm->ir_list_lock, flags);
- entry = READ_ONCE(*(svm->avic_physical_id_cache));
- WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);
+ if (!avic_zen2_errata_workaround) {
+ u64 entry = READ_ONCE(*(svm->avic_physical_id_cache));
- entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK;
- entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK);
- entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
+ WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);
+
+ entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK;
+ entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK);
+ entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
+
+ WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
+ }
- WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
avic_update_iommu_vcpu_affinity(vcpu, h_physical_id, true);
spin_unlock_irqrestore(&svm->ir_list_lock, flags);
@@ -1071,7 +1077,7 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
void avic_vcpu_put(struct kvm_vcpu *vcpu)
{
- u64 entry;
+ u64 entry = 0;
struct vcpu_svm *svm = to_svm(vcpu);
unsigned long flags;
@@ -1084,11 +1090,13 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
* can't be scheduled out and thus avic_vcpu_{put,load}() can't run
* recursively.
*/
- entry = READ_ONCE(*(svm->avic_physical_id_cache));
- /* Nothing to do if IsRunning == '0' due to vCPU blocking. */
- if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK))
- return;
+ if (!avic_zen2_errata_workaround) {
+ /* Nothing to do if IsRunning == '0' due to vCPU blocking. */
+ entry = READ_ONCE(*(svm->avic_physical_id_cache));
+ if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK))
+ return;
+ }
/*
* Take and hold the per-vCPU interrupt remapping lock while updating
@@ -1102,8 +1110,10 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
avic_update_iommu_vcpu_affinity(vcpu, -1, 0);
- entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
- WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
+ if (!avic_zen2_errata_workaround) {
+ entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;
+ WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
+ }
spin_unlock_irqrestore(&svm->ir_list_lock, flags);
@@ -1217,5 +1227,17 @@ bool avic_hardware_setup(void)
amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);
+ if (avic_zen2_errata_workaround == -1) {
+
+ /* Assume that Zen1 and Zen2 have errata #1235 */
+ if (boot_cpu_data.x86 == 0x17)
+ avic_zen2_errata_workaround = 1;
+ else
+ avic_zen2_errata_workaround = 0;
+ }
+
+ if (avic_zen2_errata_workaround)
+ pr_info("Workaround for AVIC errata #1235 is enabled\n");
+
return true;
}
--
2.26.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 5/5] x86: KVM: SVM: workaround for AVIC's errata #1235
2023-09-28 15:04 ` [PATCH 5/5] x86: KVM: SVM: workaround for AVIC's errata #1235 Maxim Levitsky
@ 2023-09-28 15:37 ` Sean Christopherson
0 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2023-09-28 15:37 UTC (permalink / raw)
To: Maxim Levitsky
Cc: kvm, Will Deacon, Borislav Petkov, Dave Hansen,
Suravee Suthikulpanit, Thomas Gleixner, Paolo Bonzini, x86,
Robin Murphy, iommu, Ingo Molnar, Joerg Roedel, H. Peter Anvin,
linux-kernel
KVM: SVM: for the shortlog scope (applies to all relevant patches in this series)
On Thu, Sep 28, 2023, Maxim Levitsky wrote:
> On Zen2 (and likely on Zen1 as well), AVIC doesn't reliably detect a change
> in the 'is_running' bit during ICR write emulation and might skip a
> VM exit, if that bit was recently cleared.
>
> The absence of the VM exit, leads to the KVM not waking up / triggering
> nested vm exit on the target(s) of the IPI which can, in some cases,
> lead to an unbounded delays in the guest execution.
>
> As I recently discovered, a reasonable workaround exists: make the KVM
Nit, please just write "KVM", not "the KVM". KVM is a proper noun when used in
this way, e.g. saying "the KVM" is like saying "the Sean" or "the Maxim".
> never set the is_running bit.
>
> This workaround ensures that (*) all ICR writes always cause a VM exit
> and therefore correctly emulated, in expense of never enjoying VM exit-less
> ICR emulation.
This breaks svm_ir_list_add(), which relies on the vCPU's entry being up-to-date
and marked running to detect that IOMMU needs to be immediately pointed at the
current pCPU.
/*
* Update the target pCPU for IOMMU doorbells if the vCPU is running.
* If the vCPU is NOT running, i.e. is blocking or scheduled out, KVM
* will update the pCPU info when the vCPU awkened and/or scheduled in.
* See also avic_vcpu_load().
*/
entry = READ_ONCE(*(svm->avic_physical_id_cache));
if (entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)
amd_iommu_update_ga(entry & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK,
true, pi->ir_data);
> This workaround does carry a performance penalty but according to my
> benchmarks is still much better than not using AVIC at all,
> because AVIC is still used for the receiving end of the IPIs, and for the
> posted interrupts.
I really, really don't like the idea of carrying a workaround like this in
perpetuity. If there is a customer that is determined to enable AVIC on Zen1/Zen2,
then *maybe* it's something to consider, but I don't think we should carry this
if the only anticipated beneficiary is one-off users and KVM developers. IMO, the
AVIC code is complex enough as it is.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/5] x86: KVM: SVM: add support for Invalid IPI Vector interception
2023-09-28 15:04 ` [PATCH 2/5] x86: KVM: SVM: add support for Invalid IPI Vector interception Maxim Levitsky
@ 2023-09-28 15:46 ` Sean Christopherson
0 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2023-09-28 15:46 UTC (permalink / raw)
To: Maxim Levitsky
Cc: kvm, Will Deacon, Borislav Petkov, Dave Hansen,
Suravee Suthikulpanit, Thomas Gleixner, Paolo Bonzini, x86,
Robin Murphy, iommu, Ingo Molnar, Joerg Roedel, H. Peter Anvin,
linux-kernel, stable
On Thu, Sep 28, 2023, Maxim Levitsky wrote:
> In later revisions of AMD's APM, there is a new 'incomplete IPI' exit code:
>
> "Invalid IPI Vector - The vector for the specified IPI was set to an
> illegal value (VEC < 16)"
>
> Note that tests on Zen2 machine show that this VM exit doesn't happen and
> instead AVIC just does nothing.
>
> Add support for this exit code by doing nothing, instead of filling
> the kernel log with errors.
>
> Also replace an unthrottled 'pr_err()' if another unknown incomplete
> IPI exit happens with WARN_ON_ONCE()
>
> (e.g in case AMD adds yet another 'Invalid IPI' exit reason)
>
> Cc: <stable@vger.kernel.org>
>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
> arch/x86/include/asm/svm.h | 1 +
> arch/x86/kvm/svm/avic.c | 5 ++++-
> 2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 19bf955b67e0da0..3ac0ffc4f3e202b 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -268,6 +268,7 @@ enum avic_ipi_failure_cause {
> AVIC_IPI_FAILURE_TARGET_NOT_RUNNING,
> AVIC_IPI_FAILURE_INVALID_TARGET,
> AVIC_IPI_FAILURE_INVALID_BACKING_PAGE,
> + AVIC_IPI_FAILURE_INVALID_IPI_VECTOR,
> };
>
> #define AVIC_PHYSICAL_MAX_INDEX_MASK GENMASK_ULL(8, 0)
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 2092db892d7d052..c44b65af494e3ff 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -529,8 +529,11 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu)
> case AVIC_IPI_FAILURE_INVALID_BACKING_PAGE:
> WARN_ONCE(1, "Invalid backing page\n");
> break;
> + case AVIC_IPI_FAILURE_INVALID_IPI_VECTOR:
> + /* Invalid IPI with vector < 16 */
> + break;
> default:
> - pr_err("Unknown IPI interception\n");
> + WARN_ONCE(1, "Unknown avic incomplete IPI interception\n");
Hrm, I'm not sure KVM should WARN here. E.g. if someone runs with panic_on_warn=1,
running on new hardware might crash the host. I hope that AMD is smart enough to
make any future failure types "optional" in the sense that they're either opt-in,
or are largely informational-only (like AVIC_IPI_FAILURE_INVALID_IPI_VECTOR).
I think switching to vcpu_unimpl(), or maybe even pr_err_once(), is more appropriate.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/5] x86: KVM: SVM: fix for x2avic CVE-2023-5090
2023-09-28 15:04 ` [PATCH 1/5] x86: KVM: SVM: fix for x2avic CVE-2023-5090 Maxim Levitsky
@ 2023-09-28 15:53 ` Sean Christopherson
0 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2023-09-28 15:53 UTC (permalink / raw)
To: Maxim Levitsky
Cc: kvm, Will Deacon, Borislav Petkov, Dave Hansen,
Suravee Suthikulpanit, Thomas Gleixner, Paolo Bonzini, x86,
Robin Murphy, iommu, Ingo Molnar, Joerg Roedel, H. Peter Anvin,
linux-kernel, stable
KVM: SVM: for the shortlog scope
And my preference is to have the shortlog explain the code change and leave the
CVE reference to a line in the changelog. CVE numbers are meaningless without
context, e.g. listing the CVE isn't going to be at all helpful for future readers
that look at shortlogs.
E.g.
KVM: SVM: Always refresh x2APIC MSR intercepts when x2AVIC is enabled
or
KVM: SVM: Update MSR intercepts for x2AVIC when guest disables x2APIC
On Thu, Sep 28, 2023, Maxim Levitsky wrote:
> The following problem exists since the x2avic was enabled in the KVM:
Just "x2avic"
> svm_set_x2apic_msr_interception is called to enable the interception of
() after functions
> the x2apic msrs.
>
> In particular it is called at the moment the guest resets its apic.
>
> Assuming that the guest's apic was in x2apic mode, the reset will bring
> it back to the xapic mode.
>
> The svm_set_x2apic_msr_interception however has an erroneous check for
> '!apic_x2apic_mode()' which prevents it from doing anything in this case.
>
> As a result of this, all x2apic msrs are left unintercepted, and that
> exposes the bare metal x2apic (if enabled) to the guest.
> Oops.
>
> Remove the erroneous '!apic_x2apic_mode()' check to fix that.
>
> Cc: stable@vger.kernel.org
Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
> arch/x86/kvm/svm/svm.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 9507df93f410a63..acdd0b89e4715a3 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -913,8 +913,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
> if (intercept == svm->x2avic_msrs_intercepted)
> return;
>
> - if (!x2avic_enabled ||
> - !apic_x2apic_mode(svm->vcpu.arch.apic))
> + if (!x2avic_enabled)
> return;
>
> for (i = 0; i < MAX_DIRECT_ACCESS_MSRS; i++) {
> --
> 2.26.3
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/5] x86: KVM: SVM: refresh AVIC inhibition in svm_leave_nested()
2023-09-28 15:04 ` [PATCH 3/5] x86: KVM: SVM: refresh AVIC inhibition in svm_leave_nested() Maxim Levitsky
@ 2023-09-28 16:03 ` Sean Christopherson
0 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2023-09-28 16:03 UTC (permalink / raw)
To: Maxim Levitsky
Cc: kvm, Will Deacon, Borislav Petkov, Dave Hansen,
Suravee Suthikulpanit, Thomas Gleixner, Paolo Bonzini, x86,
Robin Murphy, iommu, Ingo Molnar, Joerg Roedel, H. Peter Anvin,
linux-kernel, stable
On Thu, Sep 28, 2023, Maxim Levitsky wrote:
> svm_leave_nested() similar to a nested VM exit, get the vCPU out of nested
> mode and thus should end the local inhibition of AVIC on this vCPU.
>
> Failure to do so, can lead to hangs on guest reboot.
>
> Raise the KVM_REQ_APICV_UPDATE request to refresh the AVIC state of the
> current vCPU in this case.
>
> Cc: stable@vger.kernel.org
Unnecessary newline.
Fixes: f44509f849fe ("KVM: x86: SVM: allow AVIC to co-exist with a nested guest running")
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
> arch/x86/kvm/svm/nested.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index dd496c9e5f91f28..3fea8c47679e689 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1253,6 +1253,9 @@ void svm_leave_nested(struct kvm_vcpu *vcpu)
>
> nested_svm_uninit_mmu_context(vcpu);
> vmcb_mark_all_dirty(svm->vmcb);
> +
> + if (kvm_apicv_activated(vcpu->kvm))
> + kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
> }
>
> kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
> --
> 2.26.3
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/5] iommu/amd: skip updating the IRTE entry when is_run is already false
2023-09-28 15:04 ` [PATCH 4/5] iommu/amd: skip updating the IRTE entry when is_run is already false Maxim Levitsky
@ 2023-09-28 17:27 ` Joao Martins
0 siblings, 0 replies; 14+ messages in thread
From: Joao Martins @ 2023-09-28 17:27 UTC (permalink / raw)
To: Maxim Levitsky, kvm
Cc: Will Deacon, Borislav Petkov, Dave Hansen, Suravee Suthikulpanit,
Thomas Gleixner, Paolo Bonzini, x86, Robin Murphy, iommu,
Ingo Molnar, Joerg Roedel, Sean Christopherson, H. Peter Anvin,
linux-kernel
On 28/09/2023 16:04, Maxim Levitsky wrote:
> When vCPU affinity of an IRTE which already has
> is_run == false, is updated and the update also sets is_run to false,
> there is nothing to do.
>
> The goal of this patch is to make a call to 'amd_iommu_update_ga()'
> to be relatively cheap if there is nothing to do.
>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> drivers/iommu/amd/iommu.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 95bd7c25ba6f366..10bcd436e984672 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -3774,6 +3774,15 @@ int amd_iommu_update_ga(int cpu, bool is_run, void *data)
> entry->hi.fields.destination =
> APICID_TO_IRTE_DEST_HI(cpu);
> }
> +
> + if (!is_run && !entry->lo.fields_vapic.is_run) {
> + /*
> + * No need to notify the IOMMU about an entry which
> + * already has is_run == False
> + */
> + return 0;
> + }
> +
> entry->lo.fields_vapic.is_run = is_run;
>
> return modify_irte_ga(ir_data->iommu, ir_data->irq_2_irte.devid,
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/5] AVIC bugfixes and workarounds
2023-09-28 15:04 [PATCH 0/5] AVIC bugfixes and workarounds Maxim Levitsky
` (4 preceding siblings ...)
2023-09-28 15:04 ` [PATCH 5/5] x86: KVM: SVM: workaround for AVIC's errata #1235 Maxim Levitsky
@ 2024-03-26 3:15 ` Jim Mattson
2024-03-26 15:59 ` mlevitsk
5 siblings, 1 reply; 14+ messages in thread
From: Jim Mattson @ 2024-03-26 3:15 UTC (permalink / raw)
To: Maxim Levitsky
Cc: kvm, Will Deacon, Borislav Petkov, Dave Hansen,
Suravee Suthikulpanit, Thomas Gleixner, Paolo Bonzini, x86,
Robin Murphy, iommu, Ingo Molnar, Joerg Roedel,
Sean Christopherson, H. Peter Anvin, linux-kernel, David Rientjes
On Thu, Sep 28, 2023 at 8:05 AM Maxim Levitsky <mlevitsk@redhat.com> wrote:
>
> Hi!
>
> This patch series includes several fixes to AVIC I found while working
> on a new version of nested AVIC code.
>
> Also while developing it I realized that a very simple workaround for
> AVIC's errata #1235 exists and included it in this patch series as well.
>
> Best regards,
> Maxim Levitsky
Can someone explain why we're still unwilling to enable AVIC by
default? Have the performance issues that plagued the Rome
implementation been fixed? What is AMD's guidance?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/5] AVIC bugfixes and workarounds
2024-03-26 3:15 ` [PATCH 0/5] AVIC bugfixes and workarounds Jim Mattson
@ 2024-03-26 15:59 ` mlevitsk
2024-03-26 16:52 ` Joao Martins
0 siblings, 1 reply; 14+ messages in thread
From: mlevitsk @ 2024-03-26 15:59 UTC (permalink / raw)
To: Jim Mattson
Cc: kvm, Will Deacon, Borislav Petkov, Dave Hansen,
Suravee Suthikulpanit, Thomas Gleixner, Paolo Bonzini, x86,
Robin Murphy, iommu, Ingo Molnar, Joerg Roedel,
Sean Christopherson, H. Peter Anvin, linux-kernel, David Rientjes
On Mon, 2024-03-25 at 20:15 -0700, Jim Mattson wrote:
> > On Thu, Sep 28, 2023 at 8:05 AM Maxim Levitsky <mlevitsk@redhat.com> wrote:
> > > >
> > > > Hi!
> > > >
> > > > This patch series includes several fixes to AVIC I found while working
> > > > on a new version of nested AVIC code.
> > > >
> > > > Also while developing it I realized that a very simple workaround for
> > > > AVIC's errata #1235 exists and included it in this patch series as well.
> > > >
> > > > Best regards,
> > > > Maxim Levitsky
> >
> > Can someone explain why we're still unwilling to enable AVIC by
> > default? Have the performance issues that plagued the Rome
> > implementation been fixed? What is AMD's guidance?
> >
Hi
This is what I know:
Zen1:
I never tested it, so I don't know how well AVIC works there and if it has any erratas.
Zen2:
Has CPU errata in regard to IPI virtualization that makes it unusable in production,
but if AVIC's IPI virtualization (borrowing the Intel term here) is disabled,
then it works just fine and 1:1 equivalent to APICv without IPI.
I posted patches for this several times, latest version is here, it still applies I think:
https://lkml.iu.edu/hypermail/linux/kernel/2310.0/00790.html
Zen3:
For some reason AVIC got disabled by AMD in CPUID. It is still there though and force_avic=1 kvm_amd option
can make KVM use it and AFAIK it works just fine.
It is possible that it got disabled due to Zen2 errata that is fixed on Zen3,
but maybe AMD wasn't sure back then that it will be fixed or it might be due to performance issues with broadcast
IPIs which I think ended up being a software issue and was fixed a long time ago.
Zen4+
I haven't tested it much, but AFAIK it should work out of the box. It also got x2avic mode which allows
to use AVIC with VMs that have more that 254 vCPUs.
IMHO if we merge the workaround I have for IPI virtualization and make IPI virtualization off for Zen2
(and maybe Zen1 as well), then I don't see why we can't make AVIC be the default on.
Best regards,
Maxim Levitsky
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/5] AVIC bugfixes and workarounds
2024-03-26 15:59 ` mlevitsk
@ 2024-03-26 16:52 ` Joao Martins
0 siblings, 0 replies; 14+ messages in thread
From: Joao Martins @ 2024-03-26 16:52 UTC (permalink / raw)
To: mlevitsk, Jim Mattson
Cc: kvm, Will Deacon, Borislav Petkov, Dave Hansen,
Suravee Suthikulpanit, Thomas Gleixner, Paolo Bonzini, x86,
Robin Murphy, iommu, Ingo Molnar, Joerg Roedel,
Sean Christopherson, H. Peter Anvin, linux-kernel, David Rientjes
On 26/03/2024 15:59, mlevitsk@redhat.com wrote:
> On Mon, 2024-03-25 at 20:15 -0700, Jim Mattson wrote:
>>> On Thu, Sep 28, 2023 at 8:05 AM Maxim Levitsky <mlevitsk@redhat.com> wrote:
>>>>>
>>>>> Hi!
>>>>>
>>>>> This patch series includes several fixes to AVIC I found while working
>>>>> on a new version of nested AVIC code.
>>>>>
>>>>> Also while developing it I realized that a very simple workaround for
>>>>> AVIC's errata #1235 exists and included it in this patch series as well.
>>>>>
>>>>> Best regards,
>>>>> Maxim Levitsky
>>>
>>> Can someone explain why we're still unwilling to enable AVIC by
>>> default? Have the performance issues that plagued the Rome
>>> implementation been fixed? What is AMD's guidance?
>>>
> Hi
>
> This is what I know:
>
> Zen1:
> I never tested it, so I don't know how well AVIC works there and if it has any erratas.
>
> Zen2:
> Has CPU errata in regard to IPI virtualization that makes it unusable in production,
> but if AVIC's IPI virtualization (borrowing the Intel term here) is disabled,
> then it works just fine and 1:1 equivalent to APICv without IPI.
>
> I posted patches for this several times, latest version is here, it still applies I think:
> https://lkml.iu.edu/hypermail/linux/kernel/2310.0/00790.html
>
> Zen3:
> For some reason AVIC got disabled by AMD in CPUID. It is still there though and force_avic=1 kvm_amd option
> can make KVM use it and AFAIK it works just fine.
>
> It is possible that it got disabled due to Zen2 errata that is fixed on Zen3,
> but maybe AMD wasn't sure back then that it will be fixed or it might be due to performance issues with broadcast
> IPIs which I think ended up being a software issue and was fixed a long time ago.
>
> Zen4+
> I haven't tested it much, but AFAIK it should work out of the box. It also got x2avic mode which allows
> to use AVIC with VMs that have more that 254 vCPUs.
>
> IMHO if we merge the workaround I have for IPI virtualization and make IPI virtualization off for Zen2
> (and maybe Zen1 as well), then I don't see why we can't make AVIC be the default on.
Additionally, I think right now with avic=1 it fails vcpu creation when creating
more a vcpu with an id bigger than what's supported i.e. the MAX_VCPUS we
currently advertised in UAPI isn't quite honored. So that's the only wrinkle at
least I am aware. Sean had send this one series to inhibit AVIC when such config:
https://lore.kernel.org/linux-iommu/20230815213533.548732-1-seanjc@google.com/#r
But I am not sure if it was respinned.
Joao
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-03-26 16:53 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-28 15:04 [PATCH 0/5] AVIC bugfixes and workarounds Maxim Levitsky
2023-09-28 15:04 ` [PATCH 1/5] x86: KVM: SVM: fix for x2avic CVE-2023-5090 Maxim Levitsky
2023-09-28 15:53 ` Sean Christopherson
2023-09-28 15:04 ` [PATCH 2/5] x86: KVM: SVM: add support for Invalid IPI Vector interception Maxim Levitsky
2023-09-28 15:46 ` Sean Christopherson
2023-09-28 15:04 ` [PATCH 3/5] x86: KVM: SVM: refresh AVIC inhibition in svm_leave_nested() Maxim Levitsky
2023-09-28 16:03 ` Sean Christopherson
2023-09-28 15:04 ` [PATCH 4/5] iommu/amd: skip updating the IRTE entry when is_run is already false Maxim Levitsky
2023-09-28 17:27 ` Joao Martins
2023-09-28 15:04 ` [PATCH 5/5] x86: KVM: SVM: workaround for AVIC's errata #1235 Maxim Levitsky
2023-09-28 15:37 ` Sean Christopherson
2024-03-26 3:15 ` [PATCH 0/5] AVIC bugfixes and workarounds Jim Mattson
2024-03-26 15:59 ` mlevitsk
2024-03-26 16:52 ` Joao Martins
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).