Kernel KVM virtualization development

Kernel KVM virtualization development
 help / color / mirror / Atom feed

* Re: [PATCH V2] KVM: SEV: Update SEV-ES shutdown intercepts with more metadata
From: Tom Lendacky @ 2023-09-06 20:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Gonda, kvm, Paolo Bonzini, Joerg Roedel, Borislav Petkov,
	x86, linux-kernel
In-Reply-To: <ZPjc/PoBLPNNLukt@google.com>

On 9/6/23 15:11, Sean Christopherson wrote:
> On Wed, Sep 06, 2023, Tom Lendacky wrote:
>> On 9/6/23 10:14, Peter Gonda wrote:
>>> Currently if an SEV-ES VM shuts down userspace sees KVM_RUN struct with
>>
>> s/down userspace/down, userspace/
> 
> Heh, yeah, I read that the same way you did.
> 
>>> only the INVALID_ARGUMENT. This is a very limited amount of information
>>> to debug the situation. Instead KVM can return a
>>> KVM_EXIT_SHUTDOWN to alert userspace the VM is shutting down and
>>> is not usable any further.
>>>
>>> Signed-off-by: Peter Gonda <pgonda@google.com>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Sean Christopherson <seanjc@google.com>
>>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>>> Cc: Joerg Roedel <joro@8bytes.org>
>>> Cc: Borislav Petkov <bp@alien8.de>
>>> Cc: x86@kernel.org
>>> Cc: kvm@vger.kernel.org
>>> Cc: linux-kernel@vger.kernel.org
>>>
>>> ---
>>>    arch/x86/kvm/svm/svm.c | 8 +++++---
>>>    1 file changed, 5 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>> index 956726d867aa..cecf6a528c9b 100644
>>> --- a/arch/x86/kvm/svm/svm.c
>>> +++ b/arch/x86/kvm/svm/svm.c
>>> @@ -2131,12 +2131,14 @@ static int shutdown_interception(struct kvm_vcpu *vcpu)
>>>    	 * The VM save area has already been encrypted so it
>>>    	 * cannot be reinitialized - just terminate.
>>>    	 */
>>> -	if (sev_es_guest(vcpu->kvm))
>>> -		return -EINVAL;
>>> +	if (sev_es_guest(vcpu->kvm)) {
>>> +		kvm_run->exit_reason = KVM_EXIT_SHUTDOWN;
>>> +		return 0;
>>> +	}
>>
>> Just a nit... feel free to ignore, but, since KVM_EXIT_SHUTDOWN is also set
>> at the end of the function and I don't think kvm_vcpu_reset() clears the
>> value from kvm_run, you could just set kvm_run->exit_reason on entry and
>> just return 0 early for an SEV-ES guest.
> 
> kvm_run is writable by userspace though, so KVM can't rely on kvm_run->exit_reason
> for correctness.
> 
> And IIUC, the VMSA is also toast, i.e. doing anything other than marking the VM
> dead is futile, no?

I was just saying that "kvm_run->exit_reason = KVM_EXIT_SHUTDOWN;" is in 
the shutdown_interception() function twice now (at both exit points of the 
function) and can probably just be moved to the top of the function and be 
common for both exit points, now, right?

I'm not saying to get rid of it, just set it sooner.

Thanks,
Tom


^ permalink raw reply

* Re: Linux 6.5 speed regression, boot VERY slow with anything systemd related
From: Marc Haber @ 2023-09-06 20:18 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Sean Christopherson, Bagas Sanjaya, linux-kernel,
	Linux Regressions, Linux KVM, Paolo Bonzini
In-Reply-To: <20230906152616.GE11676@atomide.com>

On Wed, Sep 06, 2023 at 06:26:16PM +0300, Tony Lindgren wrote:
> * Marc Haber <mh+linux-kernel@zugschlus.de> [230906 14:41]:
> > With my tools I have found out that it really seems to be related to the
> > CPU of the host. I have changed my VM definition to "copy host CPU
> > configuration to VM" in libvirt and have moved this very VM (image and
> > settings) to hosts with a "Ryzen 5 Pro 4650G" and to an "Intel Xeon
> > E3-1246" where they work flawlessly, while on both APUs I have available
> > ("AMD G-T40E" and "AMD GX-412TC SOC") the regression in 6.5 shows. And
> > if I boot other VMs on the APUs with 6.5 the issue comes up. It is a
> > clear regression since going back to 4.6's serial code solves the issue
> > on the APUs.
> 
> Not sure why the CPU matters here..

Neither am I.

> One thing to check is if you have these in your .config:
> 
> CONFIG_SERIAL_CORE=y
> CONFIG_SERIAL_CORE_CONSOLE=y

That's affirmative. Otherwise, I think that serial console wouldn't work
at all, but I get early kernel messages just fine and even at normal
speed.

> Or do you maybe have CONFIG_SERIAL_CORE=m as loadable module?

Negative.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply

* Re: [PATCH V2] KVM: SEV: Update SEV-ES shutdown intercepts with more metadata
From: Sean Christopherson @ 2023-09-06 20:11 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Peter Gonda, kvm, Paolo Bonzini, Joerg Roedel, Borislav Petkov,
	x86, linux-kernel
In-Reply-To: <68a44c6d-21c9-30c2-b0cf-66f02f9d2f4e@amd.com>

On Wed, Sep 06, 2023, Tom Lendacky wrote:
> On 9/6/23 10:14, Peter Gonda wrote:
> > Currently if an SEV-ES VM shuts down userspace sees KVM_RUN struct with
> 
> s/down userspace/down, userspace/

Heh, yeah, I read that the same way you did.

> > only the INVALID_ARGUMENT. This is a very limited amount of information
> > to debug the situation. Instead KVM can return a
> > KVM_EXIT_SHUTDOWN to alert userspace the VM is shutting down and
> > is not usable any further.
> > 
> > Signed-off-by: Peter Gonda <pgonda@google.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Sean Christopherson <seanjc@google.com>
> > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: x86@kernel.org
> > Cc: kvm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > 
> > ---
> >   arch/x86/kvm/svm/svm.c | 8 +++++---
> >   1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 956726d867aa..cecf6a528c9b 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -2131,12 +2131,14 @@ static int shutdown_interception(struct kvm_vcpu *vcpu)
> >   	 * The VM save area has already been encrypted so it
> >   	 * cannot be reinitialized - just terminate.
> >   	 */
> > -	if (sev_es_guest(vcpu->kvm))
> > -		return -EINVAL;
> > +	if (sev_es_guest(vcpu->kvm)) {
> > +		kvm_run->exit_reason = KVM_EXIT_SHUTDOWN;
> > +		return 0;
> > +	}
> 
> Just a nit... feel free to ignore, but, since KVM_EXIT_SHUTDOWN is also set
> at the end of the function and I don't think kvm_vcpu_reset() clears the
> value from kvm_run, you could just set kvm_run->exit_reason on entry and
> just return 0 early for an SEV-ES guest.

kvm_run is writable by userspace though, so KVM can't rely on kvm_run->exit_reason
for correctness.

And IIUC, the VMSA is also toast, i.e. doing anything other than marking the VM
dead is futile, no?

^ permalink raw reply

* Re: [PATCH] KVM: X86: Reduce calls to vcpu_load
From: Sean Christopherson @ 2023-09-06 20:08 UTC (permalink / raw)
  To: Xiaoyao Li; +Cc: Hao Peng, pbonzini, kvm, linux-kernel
In-Reply-To: <10bdaf6d-1c5c-6502-c340-db3f84bf74a1@intel.com>

On Wed, Sep 06, 2023, Xiaoyao Li wrote:
> On 9/6/2023 2:24 PM, Hao Peng wrote:
> > From: Peng Hao <flyingpeng@tencent.com>
> > 
> > The call of vcpu_load/put takes about 1-2us. Each
> > kvm_arch_vcpu_create will call vcpu_load/put
> > to initialize some fields of vmcs, which can be
> > delayed until the call of vcpu_ioctl to process
> > this part of the vmcs field, which can reduce calls
> > to vcpu_load.
> 
> what if no vcpu ioctl is called after vcpu creation?
> 
> And will the first (it was second before this patch) vcpu_load() becomes
> longer? have you measured it?

I don't think the first vcpu_load() becomes longer, this avoids an entire
load()+put() pair by doing the initialization in the first ioctl().

That said, the patch is obviously buggy, it hooks kvm_arch_vcpu_ioctl() instead
of kvm_vcpu_ioctl(), e.g. doing KVM_RUN, KVM_SET_SREGS, etc. will cause explosions.

It will also break the TSC synchronization logic in kvm_arch_vcpu_postcreate(),
which can "race" with ioctls() as the vCPU file descriptor is accessible by
userspace the instant it's installed into the fd tables, i.e. userspace doesn't
have to wait for KVM_CREATE_VCPU to complete.

And I gotta imagine there are other interactions I haven't thought of off the
top of my head, e.g. the vCPU is also reachable via kvm_for_each_vcpu().  All it
takes is one path that touches a lazily initialized field for this to fall apart.

> I don't think it worth the optimization unless a strong reason.

Yeah, this is a lot of subtle complexity to shave 1-2us.

^ permalink raw reply

* Re: [PATCH 00/13] Implement support for IBS virtualization
From: Peter Zijlstra @ 2023-09-06 19:56 UTC (permalink / raw)
  To: Manali Shukla
  Cc: kvm, seanjc, linux-doc, linux-perf-users, x86, pbonzini, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj
In-Reply-To: <012c9897-51d7-87d3-e0e5-3856fa9644e5@amd.com>

On Wed, Sep 06, 2023 at 09:08:25PM +0530, Manali Shukla wrote:
> Hi Peter,
> 
> Thank you for looking into this.
> 
> On 9/5/2023 9:17 PM, Peter Zijlstra wrote:
> > On Mon, Sep 04, 2023 at 09:53:34AM +0000, Manali Shukla wrote:
> > 
> >> Note that, since IBS registers are swap type C [2], the hypervisor is
> >> responsible for saving and restoring of IBS host state. Hypervisor
> >> does so only when IBS is active on the host to avoid unnecessary
> >> rdmsrs/wrmsrs. Hypervisor needs to disable host IBS before saving the
> >> state and enter the guest. After a guest exit, the hypervisor needs to
> >> restore host IBS state and re-enable IBS.
> > 
> > Why do you think it is OK for a guest to disable the host IBS when
> > entering a guest? Perhaps the host was wanting to profile the guest.
> > 
> 
> 1. Since IBS registers are of swap type C [1], only guest state is saved
> and restored by the hardware. Host state needs to be saved and restored by
> hypervisor. In order to save IBS registers correctly, IBS needs to be
> disabled before saving the IBS registers.
> 
> 2. As per APM [2],
> "When a VMRUN is executed to an SEV-ES guest with IBS virtualization enabled, the
> IbsFetchCtl[IbsFetchEn] and IbsOpCtl[IbsOpEn] MSR bits must be 0. If either of 
> these bits are not 0, the VMRUN will fail with a VMEXIT_INVALID error code."
> This is enforced by hardware on SEV-ES guests when VIBS is enabled on SEV-ES
> guests.

I'm not sure I'm fluent in virt speak (in fact, I'm sure I'm not). Is
the above saying that a host can never IBS profile a guest?

Does the current IBS thing assert perf_event_attr::exclude_guest is set?

I can't quickly find anything :-(

^ permalink raw reply

* [ANNOUNCE] KVM LPC Microconference Call for Abstracts closing on Friday, September 8th
From: Sean Christopherson @ 2023-09-06 19:56 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, Sean Christopherson, Paolo Bonzini

The call for abstracts for the KVM Microconference will "officially" close this
Friday, September 8th.  We will review submissions next week and publish the
schedule no later than September 15th.

Apologies for the short notice, I was originally thinking we'd wait to publish
the schedule until October 1st, but we ultimately decided to go with September 15th
to give everyone more time to prepare, book travel, etc.

Thanks!

^ permalink raw reply

* Re: [PATCH V2] KVM: SEV: Update SEV-ES shutdown intercepts with more metadata
From: Tom Lendacky @ 2023-09-06 19:18 UTC (permalink / raw)
  To: Peter Gonda, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Joerg Roedel, Borislav Petkov,
	x86, linux-kernel
In-Reply-To: <20230906151449.18312-1-pgonda@google.com>

On 9/6/23 10:14, Peter Gonda wrote:
> Currently if an SEV-ES VM shuts down userspace sees KVM_RUN struct with

s/down userspace/down, userspace/

> only the INVALID_ARGUMENT. This is a very limited amount of information
> to debug the situation. Instead KVM can return a
> KVM_EXIT_SHUTDOWN to alert userspace the VM is shutting down and
> is not usable any further.
> 
> Signed-off-by: Peter Gonda <pgonda@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> 
> ---
>   arch/x86/kvm/svm/svm.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 956726d867aa..cecf6a528c9b 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2131,12 +2131,14 @@ static int shutdown_interception(struct kvm_vcpu *vcpu)
>   	 * The VM save area has already been encrypted so it
>   	 * cannot be reinitialized - just terminate.
>   	 */
> -	if (sev_es_guest(vcpu->kvm))
> -		return -EINVAL;
> +	if (sev_es_guest(vcpu->kvm)) {
> +		kvm_run->exit_reason = KVM_EXIT_SHUTDOWN;
> +		return 0;
> +	}

Just a nit... feel free to ignore, but, since KVM_EXIT_SHUTDOWN is also 
set at the end of the function and I don't think kvm_vcpu_reset() clears 
the value from kvm_run, you could just set kvm_run->exit_reason on entry 
and just return 0 early for an SEV-ES guest.

Overall, though:

Acked-by: Tom Lendacky <thomas.lendacky@amd.com>

Thanks,
Tom

>   
>   	/*
>   	 * VMCB is undefined after a SHUTDOWN intercept.  INIT the vCPU to put
> -	 * the VMCB in a known good state.  Unfortuately, KVM doesn't have
> +	 * the VMCB in a known good state.  Unfortunately, KVM doesn't have
>   	 * KVM_MP_STATE_SHUTDOWN and can't add it without potentially breaking
>   	 * userspace.  At a platform view, INIT is acceptable behavior as
>   	 * there exist bare metal platforms that automatically INIT the CPU

^ permalink raw reply

* Re: [PATCH 00/10] RISC-V: Refactor instructions
From: Charlie Jenkins @ 2023-09-06 18:51 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Palmer Dabbelt, jrtc27, linux-riscv, linux-kernel, kvm, kvm-riscv,
	bpf, Paul Walmsley, aou, peterz, jpoimboe, jbaron, rostedt,
	Ard Biesheuvel, anup, atishp, ast, daniel, andrii, martin.lau,
	song, yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, bjorn,
	luke.r.nels, xi.wang, namcaov
In-Reply-To: <20230818-63347af7195b7385c146778d@orel>

On Fri, Aug 18, 2023 at 09:30:32AM +0200, Andrew Jones wrote:
> On Thu, Aug 17, 2023 at 10:52:22AM -0700, Palmer Dabbelt wrote:
> > On Thu, 17 Aug 2023 09:43:16 PDT (-0700), Charlie Jenkins wrote:
> ...
> > > It seems to me that it will be significantly more challenging to use
> > > riscv-opcodes than it would for people to just hand create the macros
> > > that they need.
> > 
> > Ya, riscv-opcodes is pretty custy.  We stopped using it elsewhere ages ago.
> 
> Ah, pity I didn't know the history of it or I wouldn't have suggested it,
> wasting Charlie's time (sorry, Charlie!). So everywhere that needs
> encodings are manually scraping them from the PDFs? Or maybe we can write
> our own parser which converts adoc/wavedrom files[1] to Linux C?
> 
> [1] https://github.com/riscv/riscv-isa-manual/tree/main/src/images/wavedrom

The problem with the wavedrom files is that there are no standard for
how each instruction is identified. The title of of the adoc gives some
insight and there is generally a funct3 or specific opcode that is
associated with the instruction but it would be kind of messy to write a
script to parse that. I think manually constructing the instructions is
fine. When somebody wants to add a new instruction they probably will
not need to add very many at a time, so it should be only a couple of
lines that they will be able to test.

> 
> Thanks,
> drew

^ permalink raw reply

* [GIT PULL] KVM changes for 6.6 merge window
From: Paolo Bonzini @ 2023-09-06 17:48 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel, kvm

Linus,

The following changes since commit 2dde18cd1d8fac735875f2e4987f11817cc0bc2c:

  Linux 6.5 (2023-08-27 14:49:51 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to d011151616e73de20c139580b73fa4c7042bd861:

  Merge branch 'kvm-x86-mmu-6.6' into HEAD (2023-09-01 15:50:38 -0400)

There are no conflict, just one remark below about a late-ish rebase
of a topic branch.

----------------------------------------------------------------
ARM:

* Clean up vCPU targets, always returning generic v8 as the preferred target

* Trap forwarding infrastructure for nested virtualization (used for traps
  that are taken from an L2 guest and are needed by the L1 hypervisor)

* FEAT_TLBIRANGE support to only invalidate specific ranges of addresses
  when collapsing a table PTE to a block PTE.  This avoids that the guest
  refills the TLBs again for addresses that aren't covered by the table PTE.

* Fix vPMU issues related to handling of PMUver.

* Don't unnecessary align non-stack allocations in the EL2 VA space

* Drop HCR_VIRT_EXCP_MASK, which was never used...

* Don't use smp_processor_id() in kvm_arch_vcpu_load(),
  but the cpu parameter instead

* Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort()

* Remove prototypes without implementations

RISC-V:

* Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest

* Added ONE_REG interface for SATP mode

* Added ONE_REG interface to enable/disable multiple ISA extensions

* Improved error codes returned by ONE_REG interfaces

* Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V

* Added get-reg-list selftest for KVM RISC-V

s390:

* PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch)
  Allows a PV guest to use crypto cards. Card access is governed by
  the firmware and once a crypto queue is "bound" to a PV VM every
  other entity (PV or not) looses access until it is not bound
  anymore. Enablement is done via flags when creating the PV VM.

* Guest debug fixes (Ilya)

x86:

* Clean up KVM's handling of Intel architectural events

* Intel bugfixes

* Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use debug
  registers and generate/handle #DBs

* Clean up LBR virtualization code

* Fix a bug where KVM fails to set the target pCPU during an IRTE update

* Fix fatal bugs in SEV-ES intrahost migration

* Fix a bug where the recent (architecturally correct) change to reinject
  #BP and skip INT3 broke SEV guests (can't decode INT3 to skip it)

* Retry APIC map recalculation if a vCPU is added/enabled

* Overhaul emergency reboot code to bring SVM up to par with VMX, tie the
  "emergency disabling" behavior to KVM actually being loaded, and move all of
  the logic within KVM

* Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC
  ratio MSR cannot diverge from the default when TSC scaling is disabled
  up related code

* Add a framework to allow "caching" feature flags so that KVM can check if
  the guest can use a feature without needing to search guest CPUID

* Rip out the ancient MMU_DEBUG crud and replace the useful bits with
  CONFIG_KVM_PROVE_MMU

* Fix KVM's handling of !visible guest roots to avoid premature triple fault
  injection

* Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the API surface
  that is needed by external users (currently only KVMGT), and fix a variety
  of issues in the process

This last item had a silly one-character bug in the topic branch that
was sent to me.  Because it caused pretty bad selftest failures in
some configurations, I decided to squash in the fix.  So, while the
exact commit ids haven't been in linux-next before the merge window,
the code has.

Generic:

* Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass
  action specific data without needing to constantly update the main handlers.

* Drop unused function declarations

Selftests:

* Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs

* Add support for printf() in guest code and covert all guest asserts to use
  printf-based reporting

* Clean up the PMU event filter test and add new testcases

* Include x86 selftests in the KVM x86 MAINTAINERS entry

----------------------------------------------------------------
Aaron Lewis (5):
      KVM: selftests: Add strnlen() to the string overrides
      KVM: selftests: Add guest_snprintf() to KVM selftests
      KVM: selftests: Add additional pages to the guest to accommodate ucall
      KVM: selftests: Add string formatting options to ucall
      KVM: selftests: Add a selftest for guest prints and formatted asserts

Alexey Kardashevskiy (6):
      KVM: SEV: move set_dr_intercepts/clr_dr_intercepts from the header
      KVM: SEV: Move SEV's GP_VECTOR intercept setup to SEV
      KVM: SEV-ES: explicitly disable debug
      KVM: SVM/SEV/SEV-ES: Rework intercepts
      KVM: SEV: Enable data breakpoints in SEV-ES
      KVM: SEV-ES: Eliminate #DB intercept when DebugSwap enabled

Andrew Jones (9):
      RISC-V: KVM: Improve vector save/restore errors
      RISC-V: KVM: Improve vector save/restore functions
      KVM: arm64: selftests: Replace str_with_index with strdup_printf
      KVM: arm64: selftests: Drop SVE cap check in print_reg
      KVM: arm64: selftests: Remove print_reg's dependency on vcpu_config
      KVM: arm64: selftests: Rename vcpu_config and add to kvm_util.h
      KVM: arm64: selftests: Delete core_reg_fixup
      KVM: arm64: selftests: Split get-reg-list test code
      KVM: arm64: selftests: Finish generalizing get-reg-list

Anup Patel (5):
      RISC-V: KVM: Factor-out ONE_REG related code to its own source file
      RISC-V: KVM: Extend ONE_REG to enable/disable multiple ISA extensions
      RISC-V: KVM: Allow Zba and Zbs extensions for Guest/VM
      RISC-V: KVM: Allow Zicntr, Zicsr, Zifencei, and Zihpm for Guest/VM
      RISC-V: KVM: Sort ISA extensions alphabetically in ONE_REG interface

Bibo Mao (1):
      KVM: selftests: use unified time type for comparison

Daniel Henrique Barboza (10):
      RISC-V: KVM: provide UAPI for host SATP mode
      RISC-V: KVM: return ENOENT in *_one_reg() when reg is unknown
      RISC-V: KVM: use ENOENT in *_one_reg() when extension is unavailable
      RISC-V: KVM: do not EOPNOTSUPP in set_one_reg() zicbo(m|z)
      RISC-V: KVM: do not EOPNOTSUPP in set KVM_REG_RISCV_TIMER_REG
      RISC-V: KVM: use EBUSY when !vcpu->arch.ran_atleast_once
      RISC-V: KVM: avoid EBUSY when writing same ISA val
      RISC-V: KVM: avoid EBUSY when writing the same machine ID val
      RISC-V: KVM: avoid EBUSY when writing the same isa_ext val
      docs: kvm: riscv: document EBUSY in KVM_SET_ONE_REG

David Matlack (3):
      KVM: Rename kvm_arch_flush_remote_tlb() to kvm_arch_flush_remote_tlbs()
      KVM: Allow range-based TLB invalidation from common code
      KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code

Fuad Tabba (1):
      KVM: arm64: Remove redundant kvm_set_pfn_accessed() from user_mem_abort()

Haibo Xu (6):
      KVM: arm64: selftests: Move reject_set check logic to a function
      KVM: arm64: selftests: Move finalize_vcpu back to run_test
      KVM: selftests: Only do get/set tests on present blessed list
      KVM: selftests: Add skip_set facility to get_reg_list test
      KVM: riscv: Add KVM_GET_REG_LIST API support
      KVM: riscv: selftests: Add get-reg-list test

Ilya Leoshkevich (6):
      KVM: s390: interrupt: Fix single-stepping into interrupt handlers
      KVM: s390: interrupt: Fix single-stepping into program interrupt handlers
      KVM: s390: interrupt: Fix single-stepping kernel-emulated instructions
      KVM: s390: interrupt: Fix single-stepping userspace-emulated instructions
      KVM: s390: interrupt: Fix single-stepping keyless mode exits
      KVM: s390: selftests: Add selftest for single-stepping

Janosch Frank (2):
      Merge tag 'kvm-x86-selftests-immutable-6.6' into next
      Merge remote-tracking branch 'vfio-ap' into next

Jinrong Liang (6):
      KVM: selftests: Add x86 properties for Intel PMU in processor.h
      KVM: selftests: Drop the return of remove_event()
      KVM: selftests: Introduce "struct __kvm_pmu_event_filter" to manipulate filter
      KVM: selftests: Add test cases for unsupported PMU event filter input values
      KVM: selftests: Test if event filter meets expectations on fixed counters
      KVM: selftests: Test gp event filters don't affect fixed event filters

Li zeming (1):
      x86: kvm: x86: Remove unnecessary initial values of variables

Like Xu (3):
      KVM: x86: Use sysfs_emit() instead of sprintf()
      KVM: x86: Remove break statements that will never be executed
      KVM: x86/mmu: Move the lockdep_assert of mmu_lock to inside clear_dirty_pt_masked()

Manali Shukla (1):
      KVM: SVM: correct the size of spec_ctrl field in VMCB save area

Marc Zyngier (35):
      Merge branch kvm-arm64/6.6/generic-vcpu into kvmarm-master/next
      arm64: Add missing VA CMO encodings
      arm64: Add missing ERX*_EL1 encodings
      arm64: Add missing DC ZVA/GVA/GZVA encodings
      arm64: Add TLBI operation encodings
      arm64: Add AT operation encodings
      arm64: Add debug registers affected by HDFGxTR_EL2
      arm64: Add missing BRB/CFP/DVP/CPP instructions
      arm64: Add HDFGRTR_EL2 and HDFGWTR_EL2 layouts
      KVM: arm64: Correctly handle ACCDATA_EL1 traps
      KVM: arm64: Add missing HCR_EL2 trap bits
      KVM: arm64: nv: Add FGT registers
      KVM: arm64: Restructure FGT register switching
      KVM: arm64: nv: Add trap forwarding infrastructure
      KVM: arm64: nv: Add trap forwarding for HCR_EL2
      KVM: arm64: nv: Expose FEAT_EVT to nested guests
      KVM: arm64: nv: Add trap forwarding for MDCR_EL2
      KVM: arm64: nv: Add trap forwarding for CNTHCTL_EL2
      KVM: arm64: nv: Add fine grained trap forwarding infrastructure
      KVM: arm64: nv: Add trap forwarding for HFGxTR_EL2
      KVM: arm64: nv: Add trap forwarding for HFGITR_EL2
      KVM: arm64: nv: Add trap forwarding for HDFGxTR_EL2
      KVM: arm64: nv: Add SVC trap forwarding
      KVM: arm64: nv: Expand ERET trap forwarding to handle FGT
      KVM: arm64: nv: Add switching support for HFGxTR/HDFGxTR
      KVM: arm64: nv: Expose FGT to nested guests
      KVM: arm64: Move HCRX_EL2 switch to load/put on VHE systems
      KVM: arm64: nv: Add support for HCRX_EL2
      KVM: arm64: pmu: Resync EL0 state on counter rotation
      KVM: arm64: pmu: Guard PMU emulation definitions with CONFIG_KVM
      KVM: arm64: nv: Add trap description for SPSR_EL2 and ELR_EL2
      Merge branch kvm-arm64/nv-trap-forwarding into kvmarm-master/next
      Merge branch kvm-arm64/tlbi-range into kvmarm-master/next
      Merge branch kvm-arm64/6.6/pmu-fixes into kvmarm-master/next
      Merge branch kvm-arm64/6.6/misc into kvmarm-master/next

Mark Brown (1):
      arm64: Add feature detection for fine grained traps

Michal Luczaj (5):
      KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues
      KVM: selftests: Extend x86's sync_regs_test to check for CR4 races
      KVM: selftests: Extend x86's sync_regs_test to check for event vector races
      KVM: selftests: Extend x86's sync_regs_test to check for exception races
      KVM: x86: Remove x86_emulate_ops::guest_has_long_mode

Mingwei Zhang (1):
      KVM: x86/mmu: Plumb "struct kvm" all the way to pte_list_remove()

Minjie Du (1):
      KVM: selftests: Remove superfluous variable assignment

Oliver Upton (4):
      KVM: arm64: Delete pointless switch statement in kvm_reset_vcpu()
      KVM: arm64: Remove pointless check for changed init target
      KVM: arm64: Replace vCPU target with a configuration flag
      KVM: arm64: Always return generic v8 as the preferred target

Paolo Bonzini (10):
      Merge tag 'kvmarm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      Merge tag 'kvm-x86-generic-6.6' of https://github.com/kvm-x86/linux into HEAD
      Merge tag 'kvm-x86-selftests-6.6' of https://github.com/kvm-x86/linux into HEAD
      Merge tag 'kvm-s390-next-6.6-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      Merge tag 'kvm-riscv-6.6-1' of https://github.com/kvm-riscv/linux into HEAD
      Merge tag 'kvm-x86-pmu-6.6' of https://github.com/kvm-x86/linux into HEAD
      Merge tag 'kvm-x86-vmx-6.6' of https://github.com/kvm-x86/linux into HEAD
      Merge tag 'kvm-x86-svm-6.6' of https://github.com/kvm-x86/linux into HEAD
      Merge tag 'kvm-x86-misc-6.6' of https://github.com/kvm-x86/linux into HEAD
      Merge branch 'kvm-x86-mmu-6.6' into HEAD

Raghavendra Rao Ananta (11):
      KVM: Declare kvm_arch_flush_remote_tlbs() globally
      KVM: arm64: Use kvm_arch_flush_remote_tlbs()
      KVM: Remove CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
      arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range
      arm64: tlb: Implement __flush_s2_tlb_range_op()
      KVM: arm64: Implement __kvm_tlb_flush_vmid_range()
      KVM: arm64: Define kvm_tlb_flush_vmid_range()
      KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
      KVM: arm64: Flush only the memslot after write-protect
      KVM: arm64: Invalidate the table entries upon a range
      KVM: arm64: Use TLBI range-based instructions for unmap

Randy Dunlap (1):
      KVM: arm64: nv: Select XARRAY_MULTI to fix build error

Reiji Watanabe (4):
      KVM: arm64: PMU: Disallow vPMU on non-uniform PMUVer
      KVM: arm64: PMU: Avoid inappropriate use of host's PMUVer
      KVM: arm64: PMU: Don't advertise the STALL_SLOT event
      KVM: arm64: PMU: Don't advertise STALL_SLOT_{FRONTEND,BACKEND}

Sean Christopherson (139):
      KVM: SVM: Rewrite sev_es_prepare_switch_to_guest()'s comment about swap types
      KVM: SVM: Don't defer NMI unblocking until next exit for SEV-ES guests
      KVM: SVM: Don't try to pointlessly single-step SEV-ES guests for NMI window
      KVM: selftests: Make TEST_ASSERT_EQ() output look like normal TEST_ASSERT()
      KVM: selftests: Add a shameful hack to preserve/clobber GPRs across ucall
      KVM: selftests: Add formatted guest assert support in ucall framework
      KVM: selftests: Add arch ucall.h and inline simple arch hooks
      KVM: selftests: Add #define of expected KVM exit reason for ucall
      KVM: selftests: Convert aarch_timer to printf style GUEST_ASSERT
      KVM: selftests: Convert debug-exceptions to printf style GUEST_ASSERT
      KVM: selftests: Convert ARM's hypercalls test to printf style GUEST_ASSERT
      KVM: selftests: Convert ARM's page fault test to printf style GUEST_ASSERT
      KVM: selftests: Convert ARM's vGIC IRQ test to printf style GUEST_ASSERT
      KVM: selftests: Convert the memslot performance test to printf guest asserts
      KVM: selftests: Convert s390's memop test to printf style GUEST_ASSERT
      KVM: selftests: Convert s390's tprot test to printf style GUEST_ASSERT
      KVM: selftests: Convert set_memory_region_test to printf-based GUEST_ASSERT
      KVM: selftests: Convert steal_time test to printf style GUEST_ASSERT
      KVM: selftests: Convert x86's CPUID test to printf style GUEST_ASSERT
      KVM: selftests: Convert the Hyper-V extended hypercalls test to printf asserts
      KVM: selftests: Convert the Hyper-V feature test to printf style GUEST_ASSERT
      KVM: selftests: Convert x86's KVM paravirt test to printf style GUEST_ASSERT
      KVM: selftests: Convert the MONITOR/MWAIT test to use printf guest asserts
      KVM: selftests: Convert x86's nested exceptions test to printf guest asserts
      KVM: selftests: Convert x86's set BSP ID test to printf style guest asserts
      KVM: selftests: Convert the nSVM software interrupt test to printf guest asserts
      KVM: selftests: Convert x86's TSC MSRs test to use printf guest asserts
      KVM: selftests: Convert the x86 userspace I/O test to printf guest assert
      KVM: selftests: Convert VMX's PMU capabilities test to printf guest asserts
      KVM: selftests: Convert x86's XCR0 test to use printf-based guest asserts
      KVM: selftests: Rip out old, param-based guest assert macros
      KVM: selftests: Print out guest RIP on unhandled exception
      KVM: selftests: Use GUEST_FAIL() in ARM's arch timer helpers
      KVM: x86: Snapshot host's MSR_IA32_ARCH_CAPABILITIES
      KVM: VMX: Drop unnecessary vmx_fb_clear_ctrl_available "cache"
      KVM: VMX: Drop manual TLB flush when migrating vmcs.APIC_ACCESS_ADDR
      KVM: SVM: Fix dead KVM_BUG() code in LBR MSR virtualization
      KVM: SVM: Clean up handling of LBR virtualization enabled
      KVM: SVM: Use svm_get_lbr_vmcb() helper to handle writes to DEBUGCTL
      KVM: x86/pmu: Use enums instead of hardcoded magic for arch event indices
      KVM: x86/pmu: Simplify intel_hw_event_available()
      KVM: x86/pmu: Require nr fixed_pmc_events to match nr max fixed counters
      KVM: x86/pmu: Move .hw_event_available() check out of PMC filter helper
      KVM: x86: Retry APIC optimized map recalc if vCPU is added/enabled
      x86/reboot: VMCLEAR active VMCSes before emergency reboot
      x86/reboot: Harden virtualization hooks for emergency reboot
      x86/reboot: KVM: Handle VMXOFF in KVM's reboot callback
      x86/reboot: KVM: Disable SVM during reboot via virt/KVM reboot callback
      x86/reboot: Assert that IRQs are disabled when turning off virtualization
      x86/reboot: Hoist "disable virt" helpers above "emergency reboot" path
      x86/reboot: Disable virtualization during reboot iff callback is registered
      x86/reboot: Expose VMCS crash hooks if and only if KVM_{INTEL,AMD} is enabled
      x86/virt: KVM: Open code cpu_has_vmx() in KVM VMX
      x86/virt: KVM: Move VMXOFF helpers into KVM VMX
      KVM: SVM: Make KVM_AMD depend on CPU_SUP_AMD or CPU_SUP_HYGON
      x86/virt: Drop unnecessary check on extended CPUID level in cpu_has_svm()
      x86/virt: KVM: Open code cpu_has_svm() into kvm_is_svm_supported()
      KVM: SVM: Check that the current CPU supports SVM in kvm_is_svm_supported()
      KVM: VMX: Ensure CPU is stable when probing basic VMX support
      x86/virt: KVM: Move "disable SVM" helper into KVM SVM
      KVM: x86: Force kvm_rebooting=true during emergency reboot/crash
      KVM: SVM: Use "standard" stgi() helper when disabling SVM
      KVM: VMX: Skip VMCLEAR logic during emergency reboots if CR4.VMXE=0
      KVM: nSVM: Check instead of asserting on nested TSC scaling support
      KVM: nSVM: Load L1's TSC multiplier based on L1 state, not L2 state
      KVM: nSVM: Use the "outer" helper for writing multiplier to MSR_AMD64_TSC_RATIO
      KVM: SVM: Clean up preemption toggling related to MSR_AMD64_TSC_RATIO
      KVM: x86: Always write vCPU's current TSC offset/ratio in vendor hooks
      KVM: nSVM: Skip writes to MSR_AMD64_TSC_RATIO if guest state isn't loaded
      KVM: Wrap kvm_{gfn,hva}_range.pte in a per-action union
      KVM: x86: Remove WARN sanity check on hypervisor timer vs. UNINITIALIZED vCPU
      KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID entry
      KVM: SVM: Set target pCPU during IRTE update if target vCPU is running
      KVM: x86: Add a framework for enabling KVM-governed x86 features
      KVM: x86/mmu: Use KVM-governed feature framework to track "GBPAGES enabled"
      KVM: VMX: Recompute "XSAVES enabled" only after CPUID update
      KVM: VMX: Check KVM CPU caps, not just VMX MSR support, for XSAVE enabling
      KVM: VMX: Rename XSAVES control to follow KVM's preferred "ENABLE_XYZ"
      KVM: x86: Use KVM-governed feature framework to track "XSAVES enabled"
      KVM: nVMX: Use KVM-governed feature framework to track "nested VMX enabled"
      KVM: nSVM: Use KVM-governed feature framework to track "NRIPS enabled"
      KVM: nSVM: Use KVM-governed feature framework to track "TSC scaling enabled"
      KVM: nSVM: Use KVM-governed feature framework to track "vVM{SAVE,LOAD} enabled"
      KVM: nSVM: Use KVM-governed feature framework to track "LBRv enabled"
      KVM: nSVM: Use KVM-governed feature framework to track "Pause Filter enabled"
      KVM: nSVM: Use KVM-governed feature framework to track "vGIF enabled"
      KVM: nSVM: Use KVM-governed feature framework to track "vNMI enabled"
      KVM: x86: Disallow guest CPUID lookups when IRQs are disabled
      KVM: SVM: Get source vCPUs from source VM for SEV-ES intrahost migration
      KVM: SVM: Skip VMSA init in sev_es_init_vmcb() if pointer is NULL
      KVM: SVM: Don't inject #UD if KVM attempts to skip SEV guest insn
      KVM: SVM: Require nrips support for SEV guests (and beyond)
      KVM: selftests: Reload "good" vCPU state if vCPU hits shutdown
      KVM: selftests: Explicit set #UD when *potentially* injecting exception
      KVM: x86: Update MAINTAINTERS to include selftests
      KVM: VMX: Delete ancient pr_warn() about KVM_SET_TSS_ADDR not being set
      KVM: VMX: Refresh available regs and IDT vectoring info before NMI handling
      KVM: x86/mmu: Guard against collision with KVM-defined PFERR_IMPLICIT_ACCESS
      KVM: x86/mmu: Delete pgprintk() and all its usage
      KVM: x86/mmu: Delete rmap_printk() and all its usage
      KVM: x86/mmu: Delete the "dbg" module param
      KVM: x86/mmu: Avoid pointer arithmetic when iterating over SPTEs
      KVM: x86/mmu: Cleanup sanity check of SPTEs at SP free
      KVM: x86/mmu: Rename MMU_WARN_ON() to KVM_MMU_WARN_ON()
      KVM: x86/mmu: Convert "runtime" WARN_ON() assertions to WARN_ON_ONCE()
      KVM: x86/mmu: Bug the VM if a vCPU ends up in long mode without PAE enabled
      KVM: x86/mmu: Replace MMU_DEBUG with proper KVM_PROVE_MMU Kconfig
      KVM: x86/mmu: Use BUILD_BUG_ON_INVALID() for KVM_MMU_WARN_ON() stub
      KVM: x86/mmu: BUG() in rmap helpers iff CONFIG_BUG_ON_DATA_CORRUPTION=y
      drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
      drm/i915/gvt: Verify hugepages are contiguous in physical address space
      drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
      drm/i915/gvt: Explicitly check that vGPU is attached before shadowing
      drm/i915/gvt: Error out on an attempt to shadowing an unknown GTT entry type
      drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT
      drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
      drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
      drm/i915/gvt: Protect gfn hash table with vgpu_lock
      KVM: x86/mmu: Move kvm_arch_flush_shadow_{all,memslot}() to mmu.c
      KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change
      KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs
      KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
      KVM: x86: Reject memslot MOVE operations if KVMGT is attached
      drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
      KVM: x86/mmu: Move KVM-only page-track declarations to internal header
      KVM: x86/mmu: Use page-track notifiers iff there are external users
      KVM: x86/mmu: Drop infrastructure for multiple page-track modes
      KVM: x86/mmu: Rename page-track APIs to reflect the new reality
      KVM: x86/mmu: Assert that correct locks are held for page write-tracking
      KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
      KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
      KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
      drm/i915/gvt: Drop final dependencies on KVM internal details
      KVM: x86/mmu: Add helper to convert root hpa to shadow page
      KVM: x86/mmu: Harden new PGD against roots without shadow pages
      KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page
      KVM: x86/mmu: Disallow guest from using !visible slots for page tables
      KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots
      KVM: x86/mmu: Include mmu.h in spte.h

Shaoqin Huang (1):
      KVM: arm64: Use the known cpu id instead of smp_processor_id()

Shiyuan Gao (1):
      KVM: VMX: Rename vmx_get_max_tdp_level() to vmx_get_max_ept_level()

Steffen Eiden (3):
      s390/uv: UV feature check utility
      KVM: s390: Add UV feature negotiation
      KVM: s390: pv: Allow AP-instructions for pv-guests

Takahiro Itazuri (1):
      KVM: x86: Advertise host CPUID 0x80000005 in KVM_GET_SUPPORTED_CPUID

Tao Su (1):
      KVM: x86: Advertise AMX-COMPLEX CPUID to userspace

Thomas Huth (1):
      KVM: selftests: Rename the ASSERT_EQ macro

Viktor Mihajlovski (1):
      KVM: s390: pv: relax WARN_ONCE condition for destroy fast

Vincent Donnefort (1):
      KVM: arm64: Remove size-order align in the nVHE hyp private VA range

Yan Zhao (5):
      drm/i915/gvt: remove interface intel_gvt_is_valid_gfn
      drm/i915/gvt: Don't try to unpin an empty page range
      KVM: x86: Add a new page-track hook to handle memslot deletion
      drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
      KVM: x86: Remove the unused page-track hook track_flush_slot()

Yue Haibing (3):
      KVM: arm64: Remove unused declarations
      KVM: Remove unused kvm_device_{get,put}() declarations
      KVM: Remove unused kvm_make_cpus_request_mask() declaration

Zenghui Yu (1):
      KVM: arm64: Drop HCR_VIRT_EXCP_MASK

 Documentation/virt/kvm/api.rst                     |    4 +-
 MAINTAINERS                                        |    2 +
 arch/arm/include/asm/arm_pmuv3.h                   |    2 +
 arch/arm64/include/asm/kvm_arm.h                   |   51 +-
 arch/arm64/include/asm/kvm_asm.h                   |    3 +
 arch/arm64/include/asm/kvm_host.h                  |   24 +-
 arch/arm64/include/asm/kvm_mmu.h                   |    1 +
 arch/arm64/include/asm/kvm_nested.h                |    2 +
 arch/arm64/include/asm/kvm_pgtable.h               |   10 +
 arch/arm64/include/asm/sysreg.h                    |  268 ++-
 arch/arm64/include/asm/tlbflush.h                  |  122 +-
 arch/arm64/kernel/cpufeature.c                     |    7 +
 arch/arm64/kvm/Kconfig                             |    2 +-
 arch/arm64/kvm/arm.c                               |   65 +-
 arch/arm64/kvm/emulate-nested.c                    | 1852 ++++++++++++++++++++
 arch/arm64/kvm/guest.c                             |   15 -
 arch/arm64/kvm/handle_exit.c                       |   29 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h            |  139 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h               |    1 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c                 |   11 +
 arch/arm64/kvm/hyp/nvhe/mm.c                       |   83 +-
 arch/arm64/kvm/hyp/nvhe/setup.c                    |   27 +-
 arch/arm64/kvm/hyp/nvhe/switch.c                   |    2 +-
 arch/arm64/kvm/hyp/nvhe/tlb.c                      |   30 +
 arch/arm64/kvm/hyp/pgtable.c                       |   63 +-
 arch/arm64/kvm/hyp/vhe/tlb.c                       |   28 +
 arch/arm64/kvm/mmu.c                               |  104 +-
 arch/arm64/kvm/nested.c                            |   11 +-
 arch/arm64/kvm/pmu-emul.c                          |   37 +-
 arch/arm64/kvm/pmu.c                               |   18 +
 arch/arm64/kvm/reset.c                             |   23 +-
 arch/arm64/kvm/sys_regs.c                          |   15 +
 arch/arm64/kvm/trace_arm.h                         |   26 +
 arch/arm64/kvm/vgic/vgic.h                         |    2 -
 arch/arm64/tools/cpucaps                           |    1 +
 arch/arm64/tools/sysreg                            |  129 ++
 arch/mips/include/asm/kvm_host.h                   |    3 +-
 arch/mips/kvm/mips.c                               |   12 +-
 arch/mips/kvm/mmu.c                                |    2 +-
 arch/riscv/include/asm/csr.h                       |    2 +
 arch/riscv/include/asm/kvm_host.h                  |    9 +
 arch/riscv/include/asm/kvm_vcpu_vector.h           |    6 +-
 arch/riscv/include/uapi/asm/kvm.h                  |   16 +
 arch/riscv/kvm/Makefile                            |    1 +
 arch/riscv/kvm/aia.c                               |    4 +-
 arch/riscv/kvm/mmu.c                               |    8 +-
 arch/riscv/kvm/vcpu.c                              |  547 +-----
 arch/riscv/kvm/vcpu_fp.c                           |   12 +-
 arch/riscv/kvm/vcpu_onereg.c                       | 1051 +++++++++++
 arch/riscv/kvm/vcpu_sbi.c                          |   16 +-
 arch/riscv/kvm/vcpu_timer.c                        |   11 +-
 arch/riscv/kvm/vcpu_vector.c                       |   72 +-
 arch/s390/include/asm/kvm_host.h                   |    5 +
 arch/s390/include/asm/uv.h                         |   25 +-
 arch/s390/include/uapi/asm/kvm.h                   |   16 +
 arch/s390/kernel/uv.c                              |    5 +-
 arch/s390/kvm/intercept.c                          |   38 +-
 arch/s390/kvm/interrupt.c                          |   14 +
 arch/s390/kvm/kvm-s390.c                           |  102 +-
 arch/s390/kvm/kvm-s390.h                           |   12 -
 arch/s390/kvm/pv.c                                 |   23 +-
 arch/s390/mm/fault.c                               |    2 +-
 arch/x86/include/asm/cpufeatures.h                 |    1 +
 arch/x86/include/asm/kexec.h                       |    2 -
 arch/x86/include/asm/kvm_host.h                    |   46 +-
 arch/x86/include/asm/kvm_page_track.h              |   71 +-
 arch/x86/include/asm/reboot.h                      |    7 +
 arch/x86/include/asm/svm.h                         |    5 +-
 arch/x86/include/asm/virtext.h                     |  154 --
 arch/x86/include/asm/vmx.h                         |    2 +-
 arch/x86/kernel/crash.c                            |   31 -
 arch/x86/kernel/reboot.c                           |   66 +-
 arch/x86/kvm/Kconfig                               |   15 +-
 arch/x86/kvm/cpuid.c                               |   40 +-
 arch/x86/kvm/cpuid.h                               |   46 +
 arch/x86/kvm/emulate.c                             |    2 -
 arch/x86/kvm/governed_features.h                   |   21 +
 arch/x86/kvm/hyperv.c                              |    1 -
 arch/x86/kvm/kvm_emulate.h                         |    1 -
 arch/x86/kvm/lapic.c                               |   29 +-
 arch/x86/kvm/mmu.h                                 |    2 +
 arch/x86/kvm/mmu/mmu.c                             |  371 ++--
 arch/x86/kvm/mmu/mmu_internal.h                    |   27 +-
 arch/x86/kvm/mmu/page_track.c                      |  292 +--
 arch/x86/kvm/mmu/page_track.h                      |   58 +
 arch/x86/kvm/mmu/paging_tmpl.h                     |   41 +-
 arch/x86/kvm/mmu/spte.c                            |    6 +-
 arch/x86/kvm/mmu/spte.h                            |   21 +-
 arch/x86/kvm/mmu/tdp_iter.c                        |   11 +-
 arch/x86/kvm/mmu/tdp_mmu.c                         |   37 +-
 arch/x86/kvm/pmu.c                                 |    4 +-
 arch/x86/kvm/reverse_cpuid.h                       |    1 +
 arch/x86/kvm/svm/avic.c                            |   59 +-
 arch/x86/kvm/svm/nested.c                          |   57 +-
 arch/x86/kvm/svm/sev.c                             |  100 +-
 arch/x86/kvm/svm/svm.c                             |  327 ++--
 arch/x86/kvm/svm/svm.h                             |   61 +-
 arch/x86/kvm/vmx/capabilities.h                    |    2 +-
 arch/x86/kvm/vmx/hyperv.c                          |    2 +-
 arch/x86/kvm/vmx/nested.c                          |   13 +-
 arch/x86/kvm/vmx/nested.h                          |    2 +-
 arch/x86/kvm/vmx/pmu_intel.c                       |   81 +-
 arch/x86/kvm/vmx/vmx.c                             |  228 +--
 arch/x86/kvm/vmx/vmx.h                             |    3 +-
 arch/x86/kvm/x86.c                                 |   85 +-
 arch/x86/kvm/x86.h                                 |    1 +
 drivers/gpu/drm/i915/gvt/gtt.c                     |  102 +-
 drivers/gpu/drm/i915/gvt/gtt.h                     |    1 -
 drivers/gpu/drm/i915/gvt/gvt.h                     |    3 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c                   |  120 +-
 drivers/gpu/drm/i915/gvt/page_track.c              |   10 +-
 drivers/perf/arm_pmuv3.c                           |    2 +
 drivers/s390/crypto/vfio_ap_ops.c                  |  172 +-
 drivers/s390/crypto/vfio_ap_private.h              |    6 +-
 include/kvm/arm_pmu.h                              |    4 +-
 include/linux/kvm_host.h                           |   53 +-
 tools/arch/x86/include/asm/cpufeatures.h           |    1 +
 tools/testing/selftests/kvm/Makefile               |   20 +-
 .../selftests/kvm/aarch64/aarch32_id_regs.c        |    8 +-
 tools/testing/selftests/kvm/aarch64/arch_timer.c   |   22 +-
 .../selftests/kvm/aarch64/debug-exceptions.c       |    8 +-
 tools/testing/selftests/kvm/aarch64/get-reg-list.c |  554 +-----
 tools/testing/selftests/kvm/aarch64/hypercalls.c   |   20 +-
 .../selftests/kvm/aarch64/page_fault_test.c        |   17 +-
 tools/testing/selftests/kvm/aarch64/vgic_irq.c     |    3 +-
 tools/testing/selftests/kvm/get-reg-list.c         |  401 +++++
 tools/testing/selftests/kvm/guest_print_test.c     |  219 +++
 .../selftests/kvm/include/aarch64/arch_timer.h     |   12 +-
 .../testing/selftests/kvm/include/aarch64/ucall.h  |   20 +
 .../testing/selftests/kvm/include/kvm_util_base.h  |   21 +
 .../selftests/kvm/include/riscv/processor.h        |    3 +
 tools/testing/selftests/kvm/include/riscv/ucall.h  |   20 +
 tools/testing/selftests/kvm/include/s390x/ucall.h  |   19 +
 tools/testing/selftests/kvm/include/test_util.h    |   20 +-
 tools/testing/selftests/kvm/include/ucall_common.h |   98 +-
 .../selftests/kvm/include/x86_64/processor.h       |    5 +
 tools/testing/selftests/kvm/include/x86_64/ucall.h |   13 +
 tools/testing/selftests/kvm/kvm_page_table_test.c  |    8 +-
 tools/testing/selftests/kvm/lib/aarch64/ucall.c    |   11 +-
 tools/testing/selftests/kvm/lib/guest_sprintf.c    |  307 ++++
 tools/testing/selftests/kvm/lib/kvm_util.c         |    6 +-
 tools/testing/selftests/kvm/lib/riscv/ucall.c      |   11 -
 tools/testing/selftests/kvm/lib/s390x/ucall.c      |   10 -
 tools/testing/selftests/kvm/lib/sparsebit.c        |    1 -
 tools/testing/selftests/kvm/lib/string_override.c  |    9 +
 tools/testing/selftests/kvm/lib/test_util.c        |   15 +
 tools/testing/selftests/kvm/lib/ucall_common.c     |   44 +
 tools/testing/selftests/kvm/lib/x86_64/processor.c |   18 +-
 tools/testing/selftests/kvm/lib/x86_64/ucall.c     |   36 +-
 .../testing/selftests/kvm/max_guest_memory_test.c  |    2 +-
 tools/testing/selftests/kvm/memslot_perf_test.c    |    4 +-
 tools/testing/selftests/kvm/riscv/get-reg-list.c   |  872 +++++++++
 tools/testing/selftests/kvm/s390x/cmma_test.c      |   62 +-
 tools/testing/selftests/kvm/s390x/debug_test.c     |  160 ++
 tools/testing/selftests/kvm/s390x/memop.c          |   13 +-
 tools/testing/selftests/kvm/s390x/tprot.c          |   11 +-
 .../testing/selftests/kvm/set_memory_region_test.c |   21 +-
 tools/testing/selftests/kvm/steal_time.c           |   20 +-
 tools/testing/selftests/kvm/x86_64/cpuid_test.c    |   12 +-
 .../kvm/x86_64/dirty_log_page_splitting_test.c     |   18 +-
 .../kvm/x86_64/exit_on_emulation_failure_test.c    |    2 +-
 .../kvm/x86_64/hyperv_extended_hypercalls.c        |    3 +-
 .../testing/selftests/kvm/x86_64/hyperv_features.c |   29 +-
 tools/testing/selftests/kvm/x86_64/kvm_pv_test.c   |    8 +-
 .../selftests/kvm/x86_64/monitor_mwait_test.c      |   35 +-
 .../selftests/kvm/x86_64/nested_exceptions_test.c  |   16 +-
 .../selftests/kvm/x86_64/pmu_event_filter_test.c   |  317 +++-
 .../selftests/kvm/x86_64/recalc_apic_map_test.c    |    6 +-
 .../testing/selftests/kvm/x86_64/set_boot_cpu_id.c |    6 +-
 .../kvm/x86_64/svm_nested_soft_inject_test.c       |   22 +-
 .../testing/selftests/kvm/x86_64/sync_regs_test.c  |  132 ++
 tools/testing/selftests/kvm/x86_64/tsc_msrs_test.c |   34 +-
 .../selftests/kvm/x86_64/userspace_io_test.c       |   10 +-
 .../vmx_exception_with_invalid_guest_state.c       |    2 +-
 .../selftests/kvm/x86_64/vmx_pmu_caps_test.c       |   31 +-
 .../selftests/kvm/x86_64/xapic_state_test.c        |    8 +-
 .../testing/selftests/kvm/x86_64/xcr0_cpuid_test.c |   29 +-
 .../testing/selftests/kvm/x86_64/xen_vmcall_test.c |   20 +-
 virt/kvm/Kconfig                                   |    3 -
 virt/kvm/kvm_main.c                                |   54 +-
 180 files changed, 8839 insertions(+), 3231 deletions(-)
 create mode 100644 arch/riscv/kvm/vcpu_onereg.c
 delete mode 100644 arch/x86/include/asm/virtext.h
 create mode 100644 arch/x86/kvm/governed_features.h
 create mode 100644 arch/x86/kvm/mmu/page_track.h
 create mode 100644 tools/testing/selftests/kvm/get-reg-list.c
 create mode 100644 tools/testing/selftests/kvm/guest_print_test.c
 create mode 100644 tools/testing/selftests/kvm/include/aarch64/ucall.h
 create mode 100644 tools/testing/selftests/kvm/include/riscv/ucall.h
 create mode 100644 tools/testing/selftests/kvm/include/s390x/ucall.h
 create mode 100644 tools/testing/selftests/kvm/include/x86_64/ucall.h
 create mode 100644 tools/testing/selftests/kvm/lib/guest_sprintf.c
 create mode 100644 tools/testing/selftests/kvm/riscv/get-reg-list.c
 create mode 100644 tools/testing/selftests/kvm/s390x/debug_test.c


^ permalink raw reply

* Re: [PATCH 0/2] KVM: x86/mmu: .change_pte() optimization in TDP MMU
From: Sean Christopherson @ 2023-09-06 16:46 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Yan Zhao, kvm, linux-kernel, pbonzini, Christoph Hellwig,
	Marek Szyprowski, Linus Torvalds
In-Reply-To: <5d81a9cd-f96d-bcdb-7878-74c2ead26cfb@arm.com>

On Wed, Sep 06, 2023, Robin Murphy wrote:
> On 2023-09-06 15:44, Sean Christopherson wrote:
> > On Wed, Sep 06, 2023, Robin Murphy wrote:
> > > Even non-virtualised, SWIOTLB is pretty horrible for I/O performance by its
> > > very nature - avoiding it if at all possible should always be preferred.
> > 
> > Yeah.  The main reason I didn't just sweep this under the rug is the confidential
> > VM use case, where SWIOTLB is used to bounce data from guest private memory into
> > shread buffers.  There's also a good argument that anyone that cares about I/O
> > performance in confidential VMs should put in the effort to enlighten their device
> > drivers to use shared memory directly, but practically speaking that's easier said
> > than done.
> 
> Indeed a bunch of work has gone into SWIOTLB recently trying to make it a
> bit more efficient for such cases where it can't be avoided, so it is
> definitely still interesting to learn about impacts at other levels like
> this. Maybe there's a bit of a get-out for confidential VMs though, since
> presumably there's not much point COW-ing encrypted private memory, so
> perhaps KVM might end up wanting to optimise that out and thus happen to end
> up less sensitive to unavoidable SWIOTLB behaviour anyway?

CoW should be a non-issue for confidential VMs, at least on x86.  SEV and SEV-ES
are effectively forced to pin memory as writable before it can be mapped into the
guest.  TDX and SNP and will have a different implementation, but similar behavior.

Confidential VMs would benefit purely by either eliminating or reducing the cost
of "initializing" memory, i.e. by eliminating the memcpy() or replacing it with a
memset().  I most definitely don't care enough about confidential VM I/O performance
to try and micro-optimize that behavior, their existence was simply what made me
look more closely instead of just telling Yan to stop using IDE :-)

^ permalink raw reply

* Re: [PATCH 0/2] KVM: x86/mmu: .change_pte() optimization in TDP MMU
From: Robin Murphy @ 2023-09-06 16:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yan Zhao, kvm, linux-kernel, pbonzini, Christoph Hellwig,
	Marek Szyprowski, Linus Torvalds
In-Reply-To: <ZPiQQ0OANuaOYdIS@google.com>

On 2023-09-06 15:44, Sean Christopherson wrote:
> On Wed, Sep 06, 2023, Robin Murphy wrote:
>> On 2023-09-05 19:59, Sean Christopherson wrote:
>>> And if the driver *doesn't* initialize the data, then the copy is at best pointless,
>>> and possibly even worse than leaking stale swiotlb data.
>>
>> Other than the overhead, done right it can't be any worse than if SWIOTLB
>> were not involved at all.
> 
> Yep.
> 
>>> Looking at commit ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE"),
>>> IIUC the data leak was observed with a synthetic test "driver" that was developed
>>> to verify a real data leak fixed by commit a45b599ad808 ("scsi: sg: allocate with
>>> __GFP_ZERO in sg_build_indirect()").  Which basically proves my point: copying
>>> from the source only adds value absent a bug in the owning driver.
>>
>> Huh? IIUC the bug there was that the SCSI layer failed to sanitise
>> partially-written buffers. That bug was fixed, and the scrutiny therein
>> happened to reveal that SWIOTLB *also* had a lower-level problem with
>> partial writes, in that it was corrupting DMA-mapped memory which was not
>> updated by the device. Partial DMA writes are not in themselves indicative
>> of a bug, they may well be a valid and expected behaviour.
> 
> The problem is that the comment only talks about leaking data to userspace, and
> doesn't say anything about data corruption or the "swiotlb needs to match hardware"
> justification that Linus pointed out.  I buy both of those arguments for copying
> data from the original page, but the "may prevent leaking swiotlb content" is IMO
> completely nonsensical, because if preventing leakage is the only goal, then
> explicitly initializing the memory is better in every way.
> 
> If no one objects, I'll put together a patch to rewrite the comment in terms of
> mimicking hardware and not corrupting the caller's data.

Sounds good to me. I guess the trouble is that as soon as a CVE is 
involved it can then get hard to look past it, or want to risk appearing 
to downplay it :)

>>> IMO, rather than copying from the original memory, swiotlb_tbl_map_single() should
>>> simply zero the original page(s) when establishing the mapping.  That would harden
>>> all usage of swiotlb and avoid the read-before-write behavior that is problematic
>>> for KVM.
>>
>> Depends on one's exact definition of "harden"... Corrupting memory with
>> zeros is less bad than corrupting memory with someone else's data if you
>> look at it from an information security point of view, but from a
>> not-corrupting-memory point of view it's definitely still corrupting memory
>> :/
>>
>> Taking a step back, is there not an argument that if people care about
>> general KVM performance then they should maybe stop emulating obsolete PC
>> hardware from 30 years ago, and at least emulate obsolete PC hardware from
>> 20 years ago that supports 64-bit DMA?
> 
> Heh, I don't think there's an argument per se, people most definitely shouldn't
> be emulating old hardware if they care about performance.  I already told Yan as
> much.
> 
>> Even non-virtualised, SWIOTLB is pretty horrible for I/O performance by its
>> very nature - avoiding it if at all possible should always be preferred.
> 
> Yeah.  The main reason I didn't just sweep this under the rug is the confidential
> VM use case, where SWIOTLB is used to bounce data from guest private memory into
> shread buffers.  There's also a good argument that anyone that cares about I/O
> performance in confidential VMs should put in the effort to enlighten their device
> drivers to use shared memory directly, but practically speaking that's easier said
> than done.

Indeed a bunch of work has gone into SWIOTLB recently trying to make it 
a bit more efficient for such cases where it can't be avoided, so it is 
definitely still interesting to learn about impacts at other levels like 
this. Maybe there's a bit of a get-out for confidential VMs though, 
since presumably there's not much point COW-ing encrypted private 
memory, so perhaps KVM might end up wanting to optimise that out and 
thus happen to end up less sensitive to unavoidable SWIOTLB behaviour 
anyway?

Cheers,
Robin.

^ permalink raw reply

* Re: [PATCH 2/5] nSVM: Check for optional commands and reserved encodings of TLB_CONTROL in nested VMCB
From: Stefan Sterz @ 2023-09-06 15:59 UTC (permalink / raw)
  To: Paolo Bonzini, Krish Sadhukhan, kvm
  Cc: jmattson, seanjc, vkuznets, wanpengli, joro
In-Reply-To: <f7c2d5f5-3560-8666-90be-3605220cb93c@redhat.com>

On 28.09.21 18:55, Paolo Bonzini wrote:
> On 21/09/21 01:51, Krish Sadhukhan wrote:
>> According to section "TLB Flush" in APM vol 2,
>>
>>      "Support for TLB_CONTROL commands other than the first two, is
>>       optional and is indicated by CPUID Fn8000_000A_EDX[FlushByAsid].
>>
>>       All encodings of TLB_CONTROL not defined in the APM are reserved."
>>
>> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> ---
>>   arch/x86/kvm/svm/nested.c | 19 +++++++++++++++++++
>>   1 file changed, 19 insertions(+)
>>
>> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
>> index 5e13357da21e..028cc2a1f028 100644
>> --- a/arch/x86/kvm/svm/nested.c
>> +++ b/arch/x86/kvm/svm/nested.c
>> @@ -235,6 +235,22 @@ static bool nested_svm_check_bitmap_pa(struct
>> kvm_vcpu *vcpu, u64 pa, u32 size)
>>           kvm_vcpu_is_legal_gpa(vcpu, addr + size - 1);
>>   }
>>   +static bool nested_svm_check_tlb_ctl(struct kvm_vcpu *vcpu
, u8
>> tlb_ctl)
>> +{
>> +    switch(tlb_ctl) {
>> +        case TLB_CONTROL_DO_NOTHING:
>> +        case TLB_CONTROL_FLUSH_ALL_ASID:
>> +            return true;
>> +        case TLB_CONTROL_FLUSH_ASID:
>> +        case TLB_CONTROL_FLUSH_ASID_LOCAL:
>> +            if (guest_cpuid_has(vcpu, X86_FEATURE_FLUSHBYASID))
>> +                return true;
>> +            fallthrough;
>
> Since nested FLUSHBYASID is not supported yet, this second set of case
> labels can go away.
>
> Queued with that change, thanks.
>
> Paolo
>

Are there any plans to support FLUSHBYASID in the future? It seems
VMWare Workstation and ESXi require this feature to run on top of KVM
[1]. This means that after the introduction of this check these VMs fail
to boot and report missing features. Hence, upgrading to a newer kernel
version is not an option for some users.

Sorry if I misunderstood something or if 
this is not the right place to
ask.

[1]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2008583


^ permalink raw reply

* Re: [PATCH] KVM: x86: Increase KVM_MAX_VCPUS to 4096
From: Vitaly Kuznetsov @ 2023-09-06 15:54 UTC (permalink / raw)
  To: Sean Christopherson, Kyle Meyer
  Cc: pbonzini, tglx, mingo, bp, dave.hasen, x86, hpa, kvm,
	linux-kernel, dmatlack, russ.anderson, dimitri.sivanich,
	steve.wahl
In-Reply-To: <ZNuxtU7kxnv1L88H@google.com>

Sean Christopherson <seanjc@google.com> writes:

> On Tue, Aug 15, 2023, Kyle Meyer wrote:
>> Increase KVM_MAX_VCPUS to 4096 when MAXSMP is enabled.
>> 
>> Notable changes (when MAXSMP is enabled):
>> 
>> * KMV_MAX_VCPUS will increase from 1024 to 4096.
>> * KVM_MAX_VCPU_IDS will increase from 4096 to 16384.
>> * KVM_HV_MAX_SPARSE_VCPU_SET_BITS will increase from 16 to 64.
>> * CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (0x40000005)].EAX will now be 4096.
>> 
>> * struct kvm will increase from 39408 B to 39792 B.
>> * struct kvm_ioapic will increase from 5240 B to 19064 B.
>> 
>> * The following (on-stack) bitmaps will increase from 128 B to 512 B:
>> 	* dest_vcpu_bitmap in kvm_irq_delivery_to_apic.
>> 	* vcpu_mask in kvm_hv_flush_tlb.
>> 	* vcpu_bitmap in ioapic_write_indirect.
>> 	* vp_bitmap in sparse_set_to_vcpu_mask.
>> 
>> Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
>> ---
>> Virtual machines with 4096 virtual CPUs have been created on 32 socket
>> Cascade Lake and Sapphire Rapids systems.
>> 
>> 4096 is the current maximum value because of the Hyper-V TLFS. See
>> BUILD_BUG_ON in arch/x86/kvm/hyperv.c, commit 79661c3, and Vitaly's
>> comment on https://lore.kernel.org/all/87r136shcc.fsf@redhat.com.
>
> Mostly out of curiosity, do you care about Hyper-V support?   If not, at some
> point it'd probably be worth exploring a CONFIG_KVM_HYPERV option to allow
> disabling KVM's Hyper-V support at compile time so that we're not bound by the
> restrictions of the TLFS.
>

(sorry for necroposting)

While adding CONFIG_KVM_HYPERV to disable all-things-Hyper-V may make
sense for some deployments (and as we already have CONFIG_KVM_XEN), I
don't think we should forbid KVM_MAX_VCPUS > 4096 when it is enabled:
'general purpose' (distro) kernels are used both for hosting large Linux
guests and Windows guests. Instead, I'd suggest we define
KVM_MAX_HV_VCPUS as MIN(KVM_MAX_VCPUS, 4096) and then e.g. fail
KVM_SET_CPUID[,2] if we already have > 4096 vCPUs + fail
kvm_arch_vcpu_create() if we already have something-hyperv enabled on
the already created vCPUs.

-- 
Vitaly


^ permalink raw reply

* Re: [PATCH 00/13] Implement support for IBS virtualization
From: Manali Shukla @ 2023-09-06 15:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kvm, seanjc, linux-doc, linux-perf-users, x86, pbonzini, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj
In-Reply-To: <20230905154744.GB28379@noisy.programming.kicks-ass.net>

Hi Peter,

Thank you for looking into this.

On 9/5/2023 9:17 PM, Peter Zijlstra wrote:
> On Mon, Sep 04, 2023 at 09:53:34AM +0000, Manali Shukla wrote:
> 
>> Note that, since IBS registers are swap type C [2], the hypervisor is
>> responsible for saving and restoring of IBS host state. Hypervisor
>> does so only when IBS is active on the host to avoid unnecessary
>> rdmsrs/wrmsrs. Hypervisor needs to disable host IBS before saving the
>> state and enter the guest. After a guest exit, the hypervisor needs to
>> restore host IBS state and re-enable IBS.
> 
> Why do you think it is OK for a guest to disable the host IBS when
> entering a guest? Perhaps the host was wanting to profile the guest.
> 

1. Since IBS registers are of swap type C [1], only guest state is saved
and restored by the hardware. Host state needs to be saved and restored by
hypervisor. In order to save IBS registers correctly, IBS needs to be
disabled before saving the IBS registers.

2. As per APM [2],
"When a VMRUN is executed to an SEV-ES guest with IBS virtualization enabled, the
IbsFetchCtl[IbsFetchEn] and IbsOpCtl[IbsOpEn] MSR bits must be 0. If either of 
these bits are not 0, the VMRUN will fail with a VMEXIT_INVALID error code."
This is enforced by hardware on SEV-ES guests when VIBS is enabled on SEV-ES
guests.

3. VIBS is not enabled by default. It can be enabled by an explicit
qemu command line option "-cpu +ibs". Guest should be invoked without
this option when host wants to profile the guest.

[1] https://bugzilla.kernel.org/attachment.cgi?id=304653
    AMD64 Architecture Programmer’s Manual, Vol 2, Appendix B. Layout
    of VMCB,
    Table B-2. VMCB Layout, State Save Area 
    Table B-4. VMSA Layout, State Save Area for SEV-ES

[2] https://bugzilla.kernel.org/attachment.cgi?id=304653
    AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38,
    Instruction-Based Sampling Virtualization

> Only when perf_event_attr::exclude_guest is set is this allowed,
> otherwise you have to respect the host running IBS and you're not
> allowed to touch it.
> 
> Host trumps guest etc..

- Manali

^ permalink raw reply

* Re: Linux 6.5 speed regression, boot VERY slow with anything systemd related
From: Tony Lindgren @ 2023-09-06 15:26 UTC (permalink / raw)
  To: Marc Haber
  Cc: Sean Christopherson, Bagas Sanjaya, linux-kernel,
	Linux Regressions, Linux KVM, Paolo Bonzini
In-Reply-To: <ZPiPkSY6NRzfWV5Z@torres.zugschlus.de>

* Marc Haber <mh+linux-kernel@zugschlus.de> [230906 14:41]:
> With my tools I have found out that it really seems to be related to the
> CPU of the host. I have changed my VM definition to "copy host CPU
> configuration to VM" in libvirt and have moved this very VM (image and
> settings) to hosts with a "Ryzen 5 Pro 4650G" and to an "Intel Xeon
> E3-1246" where they work flawlessly, while on both APUs I have available
> ("AMD G-T40E" and "AMD GX-412TC SOC") the regression in 6.5 shows. And
> if I boot other VMs on the APUs with 6.5 the issue comes up. It is a
> clear regression since going back to 4.6's serial code solves the issue
> on the APUs.

Not sure why the CPU matters here..

One thing to check is if you have these in your .config:

CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y

Or do you maybe have CONFIG_SERIAL_CORE=m as loadable module?

If you have CONFIG_SERIAL_CORE=m, maybe you need to modprobe serial_base
if you have some minimal rootfs that does not automatically do it for you.

Regards,

Tony

^ permalink raw reply

* Re: Linux 6.5 speed regression, boot VERY slow with anything systemd related
From: Tony Lindgren @ 2023-09-06 15:21 UTC (permalink / raw)
  To: Marc Haber
  Cc: Sean Christopherson, Bagas Sanjaya, linux-kernel,
	Linux Regressions, Linux KVM, Paolo Bonzini
In-Reply-To: <ZPiPkSY6NRzfWV5Z@torres.zugschlus.de>

* Marc Haber <mh+linux-kernel@zugschlus.de> [230906 14:41]:
> If I cannot see the host boot, I cannot debug, and if I cannot type into
> grub, I cannot find out whether removing the serial console from the
> kernel command line fixes the issue. I have removed the network
> interface to simplify things, so I need a working console.

I use something like this for a serial console:

-serial stdio -append "console=ttyS0 other kernel command line options"

Regards,

Tony

^ permalink raw reply

* [PATCH V2] KVM: SEV: Update SEV-ES shutdown intercepts with more metadata
From: Peter Gonda @ 2023-09-06 15:14 UTC (permalink / raw)
  To: kvm
  Cc: Peter Gonda, Paolo Bonzini, Sean Christopherson, Tom Lendacky,
	Joerg Roedel, Borislav Petkov, x86, linux-kernel

Currently if an SEV-ES VM shuts down userspace sees KVM_RUN struct with
only the INVALID_ARGUMENT. This is a very limited amount of information
to debug the situation. Instead KVM can return a
KVM_EXIT_SHUTDOWN to alert userspace the VM is shutting down and
is not usable any further.

Signed-off-by: Peter Gonda <pgonda@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

---
 arch/x86/kvm/svm/svm.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 956726d867aa..cecf6a528c9b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2131,12 +2131,14 @@ static int shutdown_interception(struct kvm_vcpu *vcpu)
 	 * The VM save area has already been encrypted so it
 	 * cannot be reinitialized - just terminate.
 	 */
-	if (sev_es_guest(vcpu->kvm))
-		return -EINVAL;
+	if (sev_es_guest(vcpu->kvm)) {
+		kvm_run->exit_reason = KVM_EXIT_SHUTDOWN;
+		return 0;
+	}
 
 	/*
 	 * VMCB is undefined after a SHUTDOWN intercept.  INIT the vCPU to put
-	 * the VMCB in a known good state.  Unfortuately, KVM doesn't have
+	 * the VMCB in a known good state.  Unfortunately, KVM doesn't have
 	 * KVM_MP_STATE_SHUTDOWN and can't add it without potentially breaking
 	 * userspace.  At a platform view, INIT is acceptable behavior as
 	 * there exist bare metal platforms that automatically INIT the CPU
-- 
2.42.0.283.g2d96d420d3-goog


^ permalink raw reply related

* Re: [PATCH 0/2] KVM: x86/mmu: .change_pte() optimization in TDP MMU
From: Sean Christopherson @ 2023-09-06 14:44 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Yan Zhao, kvm, linux-kernel, pbonzini, Christoph Hellwig,
	Marek Szyprowski, Linus Torvalds
In-Reply-To: <5ff1591c-d41c-331f-84a6-ac690c48ff5d@arm.com>

On Wed, Sep 06, 2023, Robin Murphy wrote:
> On 2023-09-05 19:59, Sean Christopherson wrote:
> > And if the driver *doesn't* initialize the data, then the copy is at best pointless,
> > and possibly even worse than leaking stale swiotlb data.
> 
> Other than the overhead, done right it can't be any worse than if SWIOTLB
> were not involved at all.

Yep.

> > Looking at commit ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE"),
> > IIUC the data leak was observed with a synthetic test "driver" that was developed
> > to verify a real data leak fixed by commit a45b599ad808 ("scsi: sg: allocate with
> > __GFP_ZERO in sg_build_indirect()").  Which basically proves my point: copying
> > from the source only adds value absent a bug in the owning driver.
> 
> Huh? IIUC the bug there was that the SCSI layer failed to sanitise
> partially-written buffers. That bug was fixed, and the scrutiny therein
> happened to reveal that SWIOTLB *also* had a lower-level problem with
> partial writes, in that it was corrupting DMA-mapped memory which was not
> updated by the device. Partial DMA writes are not in themselves indicative
> of a bug, they may well be a valid and expected behaviour.

The problem is that the comment only talks about leaking data to userspace, and
doesn't say anything about data corruption or the "swiotlb needs to match hardware"
justification that Linus pointed out.  I buy both of those arguments for copying
data from the original page, but the "may prevent leaking swiotlb content" is IMO
completely nonsensical, because if preventing leakage is the only goal, then
explicitly initializing the memory is better in every way.

If no one objects, I'll put together a patch to rewrite the comment in terms of
mimicking hardware and not corrupting the caller's data.

> > IMO, rather than copying from the original memory, swiotlb_tbl_map_single() should
> > simply zero the original page(s) when establishing the mapping.  That would harden
> > all usage of swiotlb and avoid the read-before-write behavior that is problematic
> > for KVM.
> 
> Depends on one's exact definition of "harden"... Corrupting memory with
> zeros is less bad than corrupting memory with someone else's data if you
> look at it from an information security point of view, but from a
> not-corrupting-memory point of view it's definitely still corrupting memory
> :/
> 
> Taking a step back, is there not an argument that if people care about
> general KVM performance then they should maybe stop emulating obsolete PC
> hardware from 30 years ago, and at least emulate obsolete PC hardware from
> 20 years ago that supports 64-bit DMA?

Heh, I don't think there's an argument per se, people most definitely shouldn't
be emulating old hardware if they care about performance.  I already told Yan as
much.

> Even non-virtualised, SWIOTLB is pretty horrible for I/O performance by its
> very nature - avoiding it if at all possible should always be preferred.

Yeah.  The main reason I didn't just sweep this under the rug is the confidential
VM use case, where SWIOTLB is used to bounce data from guest private memory into
shread buffers.  There's also a good argument that anyone that cares about I/O
performance in confidential VMs should put in the effort to enlighten their device
drivers to use shared memory directly, but practically speaking that's easier said
than done.

^ permalink raw reply

* Re: Linux 6.5 speed regression, boot VERY slow with anything systemd related
From: Marc Haber @ 2023-09-06 14:41 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Sean Christopherson, Bagas Sanjaya, linux-kernel,
	Linux Regressions, Linux KVM, Paolo Bonzini
In-Reply-To: <20230901122431.GU11676@atomide.com>

On Fri, Sep 01, 2023 at 03:24:31PM +0300, Tony Lindgren wrote:
> Yes two somewhat minimal qemu command lines for working and failing test
> case sure would help to debug this.

I have spent some time with that but have failed yet. I would appreciate
help about which qemu option I'd need to get a serial console configured
AND to get access to this serial console, alternatively get access to a
VNC console.

I have the following qemu start script so far (command line pulled from
libvirt log and simplified):
export LC_ALL=C
export QEMU_AUDIO_DRV=spice

/usr/bin/qemu-system-x86_64 \
-name guest=lasso2,debug-threads=on \
-S \
-machine pc-i440fx-2.1,accel=kvm,usb=off,dump-guest-core=off \
-m 768 \
-realtime mlock=off \
-smp 1,sockets=1,cores=1,threads=1 \
-uuid 7954f7a6-9418-4ab5-9571-97ccbea263ec \
-no-user-config \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-nodefaults \
-global PIIX4_PM.disable_s3=1 \
-global PIIX4_PM.disable_s4=1 \
-boot strict=on \
-device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 \
-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 \
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 \
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 \
-drive file=/dev/prom/lasso2,format=raw,if=none,id=drive-virtio-disk0,cache=none,discard=unmap,aio=native \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 \
-msg timestamp=on \
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 \
-vnc :1

The quoted qemu command line will listen on port 5901, but trying to
connect with tightvncviewer or vinagre yields an immediate RST. 

If I cannot see the host boot, I cannot debug, and if I cannot type into
grub, I cannot find out whether removing the serial console from the
kernel command line fixes the issue. I have removed the network
interface to simplify things, so I need a working console.

With my tools I have found out that it really seems to be related to the
CPU of the host. I have changed my VM definition to "copy host CPU
configuration to VM" in libvirt and have moved this very VM (image and
settings) to hosts with a "Ryzen 5 Pro 4650G" and to an "Intel Xeon
E3-1246" where they work flawlessly, while on both APUs I have available
("AMD G-T40E" and "AMD GX-412TC SOC") the regression in 6.5 shows. And
if I boot other VMs on the APUs with 6.5 the issue comes up. It is a
clear regression since going back to 4.6's serial code solves the issue
on the APUs.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply

* Re: [PATCH v2 04/16] kvm: Return number of free memslots
From: David Hildenbrand @ 2023-09-06 14:37 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Paolo Bonzini, Igor Mammedov, Xiao Guangrong, Michael S. Tsirkin,
	Peter Xu, Eduardo Habkost, Marcel Apfelbaum, Yanan Wang,
	Michal Privoznik, Daniel P . Berrangé, Gavin Shan,
	Alex Williamson, Stefan Hajnoczi, Maciej S . Szmigiero, kvm
In-Reply-To: <ee1bbc2b-3180-ab79-4f0d-6159577b2164@redhat.com>

On 06.09.23 16:14, David Hildenbrand wrote:
> On 29.08.23 00:26, Philippe Mathieu-Daudé wrote:
>> On 25/8/23 15:21, David Hildenbrand wrote:
>>> Let's return the number of free slots instead of only checking if there
>>> is a free slot. While at it, check all address spaces, which will also
>>> consider SMM under x86 correctly.
>>>
>>> Make the stub return UINT_MAX, such that we can call the function
>>> unconditionally.
>>>
>>> This is a preparation for memory devices that consume multiple memslots.
>>>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>     accel/kvm/kvm-all.c      | 33 ++++++++++++++++++++-------------
>>>     accel/stubs/kvm-stub.c   |  4 ++--
>>>     hw/mem/memory-device.c   |  2 +-
>>>     include/sysemu/kvm.h     |  2 +-
>>>     include/sysemu/kvm_int.h |  1 +
>>>     5 files changed, 25 insertions(+), 17 deletions(-)
>>
>>
>>> diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
>>> index 235dc661bc..f39997d86e 100644
>>> --- a/accel/stubs/kvm-stub.c
>>> +++ b/accel/stubs/kvm-stub.c
>>> @@ -109,9 +109,9 @@ int kvm_irqchip_remove_irqfd_notifier_gsi(KVMState *s, EventNotifier *n,
>>>         return -ENOSYS;
>>>     }
>>>     
>>> -bool kvm_has_free_slot(MachineState *ms)
>>> +unsigned int kvm_get_free_memslots(void)
>>>     {
>>> -    return false;
>>> +    return UINT_MAX;
>>
>> Isn't it clearer returning 0 here and keeping kvm_enabled() below?
> 
> I tried doing it similarly to vhost_has_free_slot().
> 

I'll leave the kvm_enabled() check in place, looks cleaner.

-- 
Cheers,

David / dhildenb


^ permalink raw reply

* Re: [PATCH v2 04/16] kvm: Return number of free memslots
From: David Hildenbrand @ 2023-09-06 14:14 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Paolo Bonzini, Igor Mammedov, Xiao Guangrong, Michael S. Tsirkin,
	Peter Xu, Eduardo Habkost, Marcel Apfelbaum, Yanan Wang,
	Michal Privoznik, Daniel P . Berrangé, Gavin Shan,
	Alex Williamson, Stefan Hajnoczi, Maciej S . Szmigiero, kvm
In-Reply-To: <1d68ca74-ce92-ca5f-2c8b-e4567265e2fc@linaro.org>

On 29.08.23 00:26, Philippe Mathieu-Daudé wrote:
> On 25/8/23 15:21, David Hildenbrand wrote:
>> Let's return the number of free slots instead of only checking if there
>> is a free slot. While at it, check all address spaces, which will also
>> consider SMM under x86 correctly.
>>
>> Make the stub return UINT_MAX, such that we can call the function
>> unconditionally.
>>
>> This is a preparation for memory devices that consume multiple memslots.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>    accel/kvm/kvm-all.c      | 33 ++++++++++++++++++++-------------
>>    accel/stubs/kvm-stub.c   |  4 ++--
>>    hw/mem/memory-device.c   |  2 +-
>>    include/sysemu/kvm.h     |  2 +-
>>    include/sysemu/kvm_int.h |  1 +
>>    5 files changed, 25 insertions(+), 17 deletions(-)
> 
> 
>> diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
>> index 235dc661bc..f39997d86e 100644
>> --- a/accel/stubs/kvm-stub.c
>> +++ b/accel/stubs/kvm-stub.c
>> @@ -109,9 +109,9 @@ int kvm_irqchip_remove_irqfd_notifier_gsi(KVMState *s, EventNotifier *n,
>>        return -ENOSYS;
>>    }
>>    
>> -bool kvm_has_free_slot(MachineState *ms)
>> +unsigned int kvm_get_free_memslots(void)
>>    {
>> -    return false;
>> +    return UINT_MAX;
> 
> Isn't it clearer returning 0 here and keeping kvm_enabled() below?

I tried doing it similarly to vhost_has_free_slot().

Also simplifies patch #12 :)

No strong opinion, though.

> 
> 
>> diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
>> index 511b42bde5..8b09e78b12 100644
>> --- a/include/sysemu/kvm_int.h
>> +++ b/include/sysemu/kvm_int.h
>> @@ -40,6 +40,7 @@ typedef struct KVMMemoryUpdate {
>>    typedef struct KVMMemoryListener {
>>        MemoryListener listener;
>>        KVMSlot *slots;
>> +    int nr_used_slots;
> 
> Preferably using 'unsigned' here:

Sure, that should work.

> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

Thanks!

-- 
Cheers,

David / dhildenb


^ permalink raw reply

* Re: [PATCH v2 3/8] tools: riscv: Add header file csr.h
From: Andrew Jones @ 2023-09-06 13:47 UTC (permalink / raw)
  To: Haibo Xu
  Cc: Haibo Xu, Paul Walmsley, Palmer Dabbelt, Albert Ou, Paolo Bonzini,
	Shuah Khan, Marc Zyngier, Oliver Upton, James Morse,
	Suzuki K Poulose, Zenghui Yu, Anup Patel, Atish Patra, Guo Ren,
	Conor Dooley, Daniel Henrique Barboza, wchen, Sean Christopherson,
	Ricardo Koller, Vishal Annapurve, Vipin Sharma, Aaron Lewis,
	David Matlack, Vitaly Kuznetsov, Ackerley Tng, Mingwei Zhang,
	Lei Wang, Maxim Levitsky, Peter Gonda,
	Philippe Mathieu-Daudé, Thomas Huth, Like Xu,
	David Woodhouse, Michal Luczaj, zhang songyi, linux-kernel,
	linux-riscv, kvm, linux-kselftest, linux-arm-kernel, kvmarm,
	kvm-riscv
In-Reply-To: <CAJve8ok-Z6VCziFj5t0=BoouZ-VLyGaqEng-dYGTFnP-CR36kw@mail.gmail.com>

On Wed, Sep 06, 2023 at 05:09:20PM +0800, Haibo Xu wrote:
> On Wed, Sep 6, 2023 at 3:13 PM Andrew Jones <ajones@ventanamicro.com> wrote:
> >
> > On Wed, Sep 06, 2023 at 02:35:42PM +0800, Haibo Xu wrote:
> > > On Mon, Sep 4, 2023 at 9:33 PM Andrew Jones <ajones@ventanamicro.com> wrote:
> > > >
> > > > On Sat, Sep 02, 2023 at 08:59:25PM +0800, Haibo Xu wrote:
> > > > > Borrow the csr definitions and operations from kernel's
> > > > > arch/riscv/include/asm/csr.h to tools/ for riscv.
> > > > >
> > > > > Signed-off-by: Haibo Xu <haibo1.xu@intel.com>
> > > > > ---
> > > > >  tools/arch/riscv/include/asm/csr.h | 521 +++++++++++++++++++++++++++++
> > > > >  1 file changed, 521 insertions(+)
> > > > >  create mode 100644 tools/arch/riscv/include/asm/csr.h
> > > > >
> > > > > diff --git a/tools/arch/riscv/include/asm/csr.h b/tools/arch/riscv/include/asm/csr.h
> > > > > new file mode 100644
> > > > > index 000000000000..4e86c82aacbd
> > > > > --- /dev/null
> > > > > +++ b/tools/arch/riscv/include/asm/csr.h
> > > > > @@ -0,0 +1,521 @@
> > > > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > > > +/*
> > > > > + * Copyright (C) 2015 Regents of the University of California
> > > > > + */
> > > > > +
> > > > > +#ifndef _ASM_RISCV_CSR_H
> > > > > +#define _ASM_RISCV_CSR_H
> > > > > +
> > > > > +#include <linux/bits.h>
> > > > > +
> > > > > +/* Status register flags */
> > > > > +#define SR_SIE               _AC(0x00000002, UL) /* Supervisor Interrupt Enable */
> > > > > +#define SR_MIE               _AC(0x00000008, UL) /* Machine Interrupt Enable */
> > > > > +#define SR_SPIE              _AC(0x00000020, UL) /* Previous Supervisor IE */
> > > > > +#define SR_MPIE              _AC(0x00000080, UL) /* Previous Machine IE */
> > > > > +#define SR_SPP               _AC(0x00000100, UL) /* Previously Supervisor */
> > > > > +#define SR_MPP               _AC(0x00001800, UL) /* Previously Machine */
> > > > > +#define SR_SUM               _AC(0x00040000, UL) /* Supervisor User Memory Access */
> > > > > +
> > > > > +#define SR_FS                _AC(0x00006000, UL) /* Floating-point Status */
> > > > > +#define SR_FS_OFF    _AC(0x00000000, UL)
> > > > > +#define SR_FS_INITIAL        _AC(0x00002000, UL)
> > > > > +#define SR_FS_CLEAN  _AC(0x00004000, UL)
> > > > > +#define SR_FS_DIRTY  _AC(0x00006000, UL)
> > > > > +
> > > > > +#define SR_VS                _AC(0x00000600, UL) /* Vector Status */
> > > > > +#define SR_VS_OFF    _AC(0x00000000, UL)
> > > > > +#define SR_VS_INITIAL        _AC(0x00000200, UL)
> > > > > +#define SR_VS_CLEAN  _AC(0x00000400, UL)
> > > > > +#define SR_VS_DIRTY  _AC(0x00000600, UL)
> > > > > +
> > > > > +#define SR_XS                _AC(0x00018000, UL) /* Extension Status */
> > > > > +#define SR_XS_OFF    _AC(0x00000000, UL)
> > > > > +#define SR_XS_INITIAL        _AC(0x00008000, UL)
> > > > > +#define SR_XS_CLEAN  _AC(0x00010000, UL)
> > > > > +#define SR_XS_DIRTY  _AC(0x00018000, UL)
> > > > > +
> > > > > +#define SR_FS_VS     (SR_FS | SR_VS) /* Vector and Floating-Point Unit */
> > > > > +
> > > > > +#ifndef CONFIG_64BIT
> > > >
> > > > How do we ensure CONFIG_64BIT is set?
> > > >
> > >
> > > Currently, no explicit checking for this.
> > > Shall we add a gatekeeper in this file to ensure it is set?
> >
> > Not in this file, since this file is shared by all the tools and...
> >
> > >
> > > #ifndef CONFIG_64BIT
> > > #error "CONFIG_64BIT was not set"
> > > #endif
> >
> > ...we'll surely hit this error right now since nothing is setting
> > CONFIG_64BIT when compiling KVM selftests.
> >
> > We need to define CONFIG_64BIT in the build somewhere prior to any
> > headers which depend on it being included. Maybe we can simply
> > add -DCONFIG_64BIT to CFLAGS, since all KVM selftests supported
> > architectures are 64-bit.
> >
> 
> Make sense! Another option can be just add "#define CONFIG_64BIT" at
> the begin of csr.h

Nope, other tools/tests may want to include csr.h someday and they may or
may not be targeting 64-bit. They'll need to appropriately set
CONFIG_64BIT themselves. We could require

#define CONFIG_64BIT
#include <asm/csr.h>

everywhere we include it, but that's error prone since it'll get forgotten
and nothing will complain unless a define which isn't also present in
!CONFIG_64BIT is used.

Thanks,
drew

^ permalink raw reply

* Re: [PATCH] iommu/amd: remove amd_iommu_snp_enable
From: Jason Gunthorpe @ 2023-09-06 13:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kim Phillips, joro, suravee.suthikulpanit, iommu, Michael Roth,
	Kalra, Ashish, kvm@vger.kernel.org, linux-coco
In-Reply-To: <20230901055020.GA31908@lst.de>

On Fri, Sep 01, 2023 at 07:50:20AM +0200, Christoph Hellwig wrote:
> On Thu, Aug 31, 2023 at 01:03:53PM -0500, Kim Phillips wrote:
> > +Mike Roth, Ashish
> >
> > On 8/31/23 7:31 AM, Christoph Hellwig wrote:
> >> amd_iommu_snp_enable is unused and has been since it was added in commit
> >> fb2accadaa94 ("iommu/amd: Introduce function to check and enable SNP").
> >>
> >> Signed-off-by: Christoph Hellwig <hch@lst.de>
> >> ---
> >
> > It is used by the forthcoming host SNP support:
> >
> > https://lore.kernel.org/lkml/20230612042559.375660-8-michael.roth@amd.com/
> 
> Then resend it with that support, but don't waste resources and everyones
> time now.

+1

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

I've said this many times lately. There are other things in this
driver that have no upstream justification too, like nesting
"support".

Please organize this SNP support into series that makes sense and are
self complete :( I'm not sure a 51 patch series is a productive way to
approach this..

Jason

^ permalink raw reply

* Re: [PATCH V7 vfio 07/10] vfio/mlx5: Create and destroy page tracker object
From: Joao Martins @ 2023-09-06 12:08 UTC (permalink / raw)
  To: Jason Gunthorpe, Cédric Le Goater
  Cc: Yishai Hadas, alex.williamson, saeedm, kvm, netdev, kuba,
	kevin.tian, leonro, maorg, cohuck, 'Avihai Horon',
	Tarun Gupta
In-Reply-To: <ZPhnvqmvdeBMzafd@nvidia.com>

On 06/09/2023 12:51, Jason Gunthorpe wrote:
> On Wed, Sep 06, 2023 at 10:55:26AM +0200, Cédric Le Goater wrote:
> 
>>> +	WARN_ON(node);
>>> +	log_addr_space_size = ilog2(total_ranges_len);
>>> +	if (log_addr_space_size <
>>> +	    (MLX5_CAP_ADV_VIRTUALIZATION(mdev, pg_track_log_min_addr_space)) ||
>>> +	    log_addr_space_size >
>>> +	    (MLX5_CAP_ADV_VIRTUALIZATION(mdev, pg_track_log_max_addr_space))) {
>>> +		err = -EOPNOTSUPP;
>>> +		goto out;
>>> +	}
>>
>>
>> We are seeing an issue with dirty page tracking when doing migration
>> of an OVMF VM guest. The vfio-pci variant driver for the MLX5 VF
>> device complains when dirty page tracking is initialized from QEMU :
>>
>>   qemu-kvm: 0000:b1:00.2: Failed to start DMA logging, err -95 (Operation not supported)
>>
>> The 64-bit computed range is  :
>>
>>   vfio_device_dirty_tracking_start nr_ranges 2 32:[0x0 - 0x807fffff], 64:[0x100000000 - 0x3838000fffff]
>>
>> which seems to be too large for the HW. AFAICT, the MLX5 HW has a 42
>> bits address space limitation for dirty tracking (min is 12). Is it a
>> FW tunable or a strict limitation ?
> 
> It would be good to explain where this is coming from, all devices
> need to make some decision on what address space ranges to track and I
> would say 2^42 is already pretty generous limit..
> 
> Can we go the other direction and reduce the ranges qemu is interested
> in?

There's also a chance that this are those 16x-32x socket Intel machines with
48T-64T of memory (judging from the ranges alone). Meaning that these ranges
even if reduced wouldn't remove much of the aggregate address space width.

^ permalink raw reply

* Re: [PATCH V7 vfio 07/10] vfio/mlx5: Create and destroy page tracker object
From: Jason Gunthorpe @ 2023-09-06 11:51 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Yishai Hadas, alex.williamson, saeedm, kvm, netdev, kuba,
	kevin.tian, joao.m.martins, leonro, maorg, cohuck,
	'Avihai Horon', Tarun Gupta
In-Reply-To: <9a4ddb8c-a48a-67b0-b8ad-428ee936454e@kaod.org>

On Wed, Sep 06, 2023 at 10:55:26AM +0200, Cédric Le Goater wrote:

> > +	WARN_ON(node);
> > +	log_addr_space_size = ilog2(total_ranges_len);
> > +	if (log_addr_space_size <
> > +	    (MLX5_CAP_ADV_VIRTUALIZATION(mdev, pg_track_log_min_addr_space)) ||
> > +	    log_addr_space_size >
> > +	    (MLX5_CAP_ADV_VIRTUALIZATION(mdev, pg_track_log_max_addr_space))) {
> > +		err = -EOPNOTSUPP;
> > +		goto out;
> > +	}
> 
> 
> We are seeing an issue with dirty page tracking when doing migration
> of an OVMF VM guest. The vfio-pci variant driver for the MLX5 VF
> device complains when dirty page tracking is initialized from QEMU :
> 
>   qemu-kvm: 0000:b1:00.2: Failed to start DMA logging, err -95 (Operation not supported)
> 
> The 64-bit computed range is  :
> 
>   vfio_device_dirty_tracking_start nr_ranges 2 32:[0x0 - 0x807fffff], 64:[0x100000000 - 0x3838000fffff]
> 
> which seems to be too large for the HW. AFAICT, the MLX5 HW has a 42
> bits address space limitation for dirty tracking (min is 12). Is it a
> FW tunable or a strict limitation ?

It would be good to explain where this is coming from, all devices
need to make some decision on what address space ranges to track and I
would say 2^42 is already pretty generous limit..

Can we go the other direction and reduce the ranges qemu is interested
in?

Jason

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox