public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION] from 74a0e79df68a8042fb84fd7207e57b70722cf825: VFIO PCI passthrough no longer works
@ 2024-12-10  9:04 Simon Pilkington
  2024-12-10 15:47 ` Sean Christopherson
  0 siblings, 1 reply; 9+ messages in thread
From: Simon Pilkington @ 2024-12-10  9:04 UTC (permalink / raw)
  To: seanjc; +Cc: linux-kernel, kvm, regressions

Hi,

With the aforementioned commit I am no longer able to use PCI
passthrough to a
Windows guest on the X570 chipset with a 5950X CPU.

The minimal reproducer for me is to attach a GPU to the VM and attempt
to start
Windows setup from an iso image. The VM will apparently livelock at the
setup splash
screen before the spinner appears as one of my CPU cores goes up to 100%
usage
until I force off the VM. This could be very machine-specific though.

Reverting to the old XOR check fixes both 6.12.y stable and 6.13-rc2 for me.
Otherwise they're both bad. Can you please look into it? I can share the
config
I used for test builds if it would help.

Regards,
Simon

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] from 74a0e79df68a8042fb84fd7207e57b70722cf825: VFIO PCI passthrough no longer works
  2024-12-10  9:04 [REGRESSION] from 74a0e79df68a8042fb84fd7207e57b70722cf825: VFIO PCI passthrough no longer works Simon Pilkington
@ 2024-12-10 15:47 ` Sean Christopherson
  2024-12-10 16:15   ` Tom Lendacky
  2024-12-10 20:33   ` Simon Pilkington
  0 siblings, 2 replies; 9+ messages in thread
From: Sean Christopherson @ 2024-12-10 15:47 UTC (permalink / raw)
  To: Simon Pilkington; +Cc: linux-kernel, kvm, regressions, Tom Lendacky

+Tom

On Tue, Dec 10, 2024, Simon Pilkington wrote:
> Hi,
> 
> With the aforementioned commit I am no longer able to use PCI passthrough to
> a Windows guest on the X570 chipset with a 5950X CPU.
> 
> The minimal reproducer for me is to attach a GPU to the VM and attempt to
> start Windows setup from an iso image. The VM will apparently livelock at the
> setup splash screen before the spinner appears as one of my CPU cores goes up
> to 100% usage until I force off the VM. This could be very machine-specific
> though.

Ugh.  Yeah, it's pretty much guaranteed to be CPU specific behavior.

Tom, any idea what the guest might be trying to do?  It probably doesn't matter
in the end, it's not like KVM does anything with the value...

> Reverting to the old XOR check fixes both 6.12.y stable and 6.13-rc2 for me.
> Otherwise they're both bad. Can you please look into it? I can share the
> config I used for test builds if it would help.

Can you run with the below to see what bits the guest is trying to set (or clear)?
We could get the same info via tracepoints, but this will likely be faster/easier.

---
 arch/x86/kvm/svm/svm.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index dd15cc635655..5144d0283c9d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3195,11 +3195,14 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	case MSR_AMD64_DE_CFG: {
 		u64 supported_de_cfg;
 
-		if (svm_get_feature_msr(ecx, &supported_de_cfg))
+		if (WARN_ON_ONCE(svm_get_feature_msr(ecx, &supported_de_cfg)))
 			return 1;
 
-		if (data & ~supported_de_cfg)
+		if (data & ~supported_de_cfg) {
+			pr_warn("DE_CFG supported = %llx, WRMSR = %llx\n",
+				supported_de_cfg, data);
 			return 1;
+		}
 
 		/*
 		 * Don't let the guest change the host-programmed value.  The
@@ -3207,8 +3210,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		 * are completely unknown to KVM, and the one bit known to KVM
 		 * is simply a reflection of hardware capabilities.
 		 */
-		if (!msr->host_initiated && data != svm->msr_decfg)
+		if (!msr->host_initiated && data != svm->msr_decfg) {
+			pr_warn("DE_CFG current = %llx, WRMSR = %llx\n",
+				svm->msr_decfg, data);
 			return 1;
+		}
 
 		svm->msr_decfg = data;
 		break;

base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
-- 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] from 74a0e79df68a8042fb84fd7207e57b70722cf825: VFIO PCI passthrough no longer works
  2024-12-10 15:47 ` Sean Christopherson
@ 2024-12-10 16:15   ` Tom Lendacky
  2024-12-10 20:33   ` Simon Pilkington
  1 sibling, 0 replies; 9+ messages in thread
From: Tom Lendacky @ 2024-12-10 16:15 UTC (permalink / raw)
  To: Sean Christopherson, Simon Pilkington; +Cc: linux-kernel, kvm, regressions

On 12/10/24 09:47, Sean Christopherson wrote:
> +Tom
> 
> On Tue, Dec 10, 2024, Simon Pilkington wrote:
>> Hi,
>>
>> With the aforementioned commit I am no longer able to use PCI passthrough to
>> a Windows guest on the X570 chipset with a 5950X CPU.
>>
>> The minimal reproducer for me is to attach a GPU to the VM and attempt to
>> start Windows setup from an iso image. The VM will apparently livelock at the
>> setup splash screen before the spinner appears as one of my CPU cores goes up
>> to 100% usage until I force off the VM. This could be very machine-specific
>> though.
> 
> Ugh.  Yeah, it's pretty much guaranteed to be CPU specific behavior.
> 
> Tom, any idea what the guest might be trying to do?  It probably doesn't matter
> in the end, it's not like KVM does anything with the value...

No clue. I do see that in Linux there is a zenbleed-related bit that is
used in DE_CFG:

  522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix")

I wonder if it might be related to that.

Your suggestion below to see what bits are being requested might shed
some light on it.

Thanks,
Tom

> 
>> Reverting to the old XOR check fixes both 6.12.y stable and 6.13-rc2 for me.
>> Otherwise they're both bad. Can you please look into it? I can share the
>> config I used for test builds if it would help.
> 
> Can you run with the below to see what bits the guest is trying to set (or clear)?
> We could get the same info via tracepoints, but this will likely be faster/easier.
> 
> ---
>  arch/x86/kvm/svm/svm.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index dd15cc635655..5144d0283c9d 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3195,11 +3195,14 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>  	case MSR_AMD64_DE_CFG: {
>  		u64 supported_de_cfg;
>  
> -		if (svm_get_feature_msr(ecx, &supported_de_cfg))
> +		if (WARN_ON_ONCE(svm_get_feature_msr(ecx, &supported_de_cfg)))
>  			return 1;
>  
> -		if (data & ~supported_de_cfg)
> +		if (data & ~supported_de_cfg) {
> +			pr_warn("DE_CFG supported = %llx, WRMSR = %llx\n",
> +				supported_de_cfg, data);
>  			return 1;
> +		}
>  
>  		/*
>  		 * Don't let the guest change the host-programmed value.  The
> @@ -3207,8 +3210,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>  		 * are completely unknown to KVM, and the one bit known to KVM
>  		 * is simply a reflection of hardware capabilities.
>  		 */
> -		if (!msr->host_initiated && data != svm->msr_decfg)
> +		if (!msr->host_initiated && data != svm->msr_decfg) {
> +			pr_warn("DE_CFG current = %llx, WRMSR = %llx\n",
> +				svm->msr_decfg, data);
>  			return 1;
> +		}
>  
>  		svm->msr_decfg = data;
>  		break;
> 
> base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] from 74a0e79df68a8042fb84fd7207e57b70722cf825: VFIO PCI passthrough no longer works
  2024-12-10 15:47 ` Sean Christopherson
  2024-12-10 16:15   ` Tom Lendacky
@ 2024-12-10 20:33   ` Simon Pilkington
  2024-12-10 21:06     ` Tom Lendacky
  1 sibling, 1 reply; 9+ messages in thread
From: Simon Pilkington @ 2024-12-10 20:33 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-kernel, kvm, regressions, Tom Lendacky

On 10/12/2024 16:47, Sean Christopherson wrote:
> Can you run with the below to see what bits the guest is trying to set (or clear)?
> We could get the same info via tracepoints, but this will likely be faster/easier.
> 
> ---
>  arch/x86/kvm/svm/svm.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index dd15cc635655..5144d0283c9d 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3195,11 +3195,14 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>  	case MSR_AMD64_DE_CFG: {
>  		u64 supported_de_cfg;
>  
> -		if (svm_get_feature_msr(ecx, &supported_de_cfg))
> +		if (WARN_ON_ONCE(svm_get_feature_msr(ecx, &supported_de_cfg)))
>  			return 1;
>  
> -		if (data & ~supported_de_cfg)
> +		if (data & ~supported_de_cfg) {
> +			pr_warn("DE_CFG supported = %llx, WRMSR = %llx\n",
> +				supported_de_cfg, data);
>  			return 1;
> +		}
>  
>  		/*
>  		 * Don't let the guest change the host-programmed value.  The
> @@ -3207,8 +3210,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>  		 * are completely unknown to KVM, and the one bit known to KVM
>  		 * is simply a reflection of hardware capabilities.
>  		 */
> -		if (!msr->host_initiated && data != svm->msr_decfg)
> +		if (!msr->host_initiated && data != svm->msr_decfg) {
> +			pr_warn("DE_CFG current = %llx, WRMSR = %llx\n",
> +				svm->msr_decfg, data);
>  			return 1;
> +		}
>  
>  		svm->msr_decfg = data;
>  		break;
> 
> base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4

Relevant dmesg output with some context below. VM locked up as expected.

[   85.834971] vfio-pci 0000:0c:00.0: resetting
[   85.937573] vfio-pci 0000:0c:00.0: reset done
[   86.494210] vfio-pci 0000:0c:00.0: resetting
[   86.494264] vfio-pci 0000:0c:00.1: resetting
[   86.761442] vfio-pci 0000:0c:00.0: reset done
[   86.761480] vfio-pci 0000:0c:00.1: reset done
[   86.762392] vfio-pci 0000:0c:00.0: resetting
[   86.865462] vfio-pci 0000:0c:00.0: reset done
[   86.977360] virbr0: port 1(vnet1) entered learning state
[   88.993052] virbr0: port 1(vnet1) entered forwarding state
[   88.993057] virbr0: topology change detected, propagating
[  103.459114] kvm_amd: DE_CFG current = 0, WRMSR = 2
[  161.442032] virbr0: port 1(vnet1) entered disabled state // VM shut down

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] from 74a0e79df68a8042fb84fd7207e57b70722cf825: VFIO PCI passthrough no longer works
  2024-12-10 20:33   ` Simon Pilkington
@ 2024-12-10 21:06     ` Tom Lendacky
  2024-12-10 22:43       ` Sean Christopherson
  0 siblings, 1 reply; 9+ messages in thread
From: Tom Lendacky @ 2024-12-10 21:06 UTC (permalink / raw)
  To: Simon Pilkington, Sean Christopherson; +Cc: linux-kernel, kvm, regressions

On 12/10/24 14:33, Simon Pilkington wrote:
> On 10/12/2024 16:47, Sean Christopherson wrote:
>> Can you run with the below to see what bits the guest is trying to set (or clear)?
>> We could get the same info via tracepoints, but this will likely be faster/easier.
>>
>> ---
>>  arch/x86/kvm/svm/svm.c | 12 +++++++++---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index dd15cc635655..5144d0283c9d 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3195,11 +3195,14 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>  	case MSR_AMD64_DE_CFG: {
>>  		u64 supported_de_cfg;
>>  
>> -		if (svm_get_feature_msr(ecx, &supported_de_cfg))
>> +		if (WARN_ON_ONCE(svm_get_feature_msr(ecx, &supported_de_cfg)))
>>  			return 1;
>>  
>> -		if (data & ~supported_de_cfg)
>> +		if (data & ~supported_de_cfg) {
>> +			pr_warn("DE_CFG supported = %llx, WRMSR = %llx\n",
>> +				supported_de_cfg, data);
>>  			return 1;
>> +		}
>>  
>>  		/*
>>  		 * Don't let the guest change the host-programmed value.  The
>> @@ -3207,8 +3210,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>  		 * are completely unknown to KVM, and the one bit known to KVM
>>  		 * is simply a reflection of hardware capabilities.
>>  		 */
>> -		if (!msr->host_initiated && data != svm->msr_decfg)
>> +		if (!msr->host_initiated && data != svm->msr_decfg) {
>> +			pr_warn("DE_CFG current = %llx, WRMSR = %llx\n",
>> +				svm->msr_decfg, data);
>>  			return 1;
>> +		}
>>  
>>  		svm->msr_decfg = data;
>>  		break;
>>
>> base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
> 
> Relevant dmesg output with some context below. VM locked up as expected.
> 
> [   85.834971] vfio-pci 0000:0c:00.0: resetting
> [   85.937573] vfio-pci 0000:0c:00.0: reset done
> [   86.494210] vfio-pci 0000:0c:00.0: resetting
> [   86.494264] vfio-pci 0000:0c:00.1: resetting
> [   86.761442] vfio-pci 0000:0c:00.0: reset done
> [   86.761480] vfio-pci 0000:0c:00.1: reset done
> [   86.762392] vfio-pci 0000:0c:00.0: resetting
> [   86.865462] vfio-pci 0000:0c:00.0: reset done
> [   86.977360] virbr0: port 1(vnet1) entered learning state
> [   88.993052] virbr0: port 1(vnet1) entered forwarding state
> [   88.993057] virbr0: topology change detected, propagating
> [  103.459114] kvm_amd: DE_CFG current = 0, WRMSR = 2
> [  161.442032] virbr0: port 1(vnet1) entered disabled state // VM shut down

That is the MSR_AMD64_DE_CFG_LFENCE_SERIALIZE bit. Yeah, that actually
does change the behavior of LFENCE and isn't just a reflection of the
hardware.

Linux does set that bit on boot, too (if LFENCE always serializing isn't
advertised 8000_0021_EAX[2]), so I'm kind of surprised it didn't pop up
there.

I imagine that the above CPUID bit isn't set, so an attempt is made to
set the MSR bit.

Thanks,
Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] from 74a0e79df68a8042fb84fd7207e57b70722cf825: VFIO PCI passthrough no longer works
  2024-12-10 21:06     ` Tom Lendacky
@ 2024-12-10 22:43       ` Sean Christopherson
  2024-12-11  8:54         ` Simon Pilkington
  2024-12-11 14:37         ` Tom Lendacky
  0 siblings, 2 replies; 9+ messages in thread
From: Sean Christopherson @ 2024-12-10 22:43 UTC (permalink / raw)
  To: Tom Lendacky; +Cc: Simon Pilkington, linux-kernel, kvm, regressions

On Tue, Dec 10, 2024, Tom Lendacky wrote:
> On 12/10/24 14:33, Simon Pilkington wrote:
> > On 10/12/2024 16:47, Sean Christopherson wrote:
> >> Can you run with the below to see what bits the guest is trying to set (or clear)?
> >> We could get the same info via tracepoints, but this will likely be faster/easier.
> >>
> >> ---
> >>  arch/x86/kvm/svm/svm.c | 12 +++++++++---
> >>  1 file changed, 9 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> >> index dd15cc635655..5144d0283c9d 100644
> >> --- a/arch/x86/kvm/svm/svm.c
> >> +++ b/arch/x86/kvm/svm/svm.c
> >> @@ -3195,11 +3195,14 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> >>  	case MSR_AMD64_DE_CFG: {
> >>  		u64 supported_de_cfg;
> >>  
> >> -		if (svm_get_feature_msr(ecx, &supported_de_cfg))
> >> +		if (WARN_ON_ONCE(svm_get_feature_msr(ecx, &supported_de_cfg)))
> >>  			return 1;
> >>  
> >> -		if (data & ~supported_de_cfg)
> >> +		if (data & ~supported_de_cfg) {
> >> +			pr_warn("DE_CFG supported = %llx, WRMSR = %llx\n",
> >> +				supported_de_cfg, data);
> >>  			return 1;
> >> +		}
> >>  
> >>  		/*
> >>  		 * Don't let the guest change the host-programmed value.  The
> >> @@ -3207,8 +3210,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> >>  		 * are completely unknown to KVM, and the one bit known to KVM
> >>  		 * is simply a reflection of hardware capabilities.
> >>  		 */
> >> -		if (!msr->host_initiated && data != svm->msr_decfg)
> >> +		if (!msr->host_initiated && data != svm->msr_decfg) {
> >> +			pr_warn("DE_CFG current = %llx, WRMSR = %llx\n",
> >> +				svm->msr_decfg, data);
> >>  			return 1;
> >> +		}
> >>  
> >>  		svm->msr_decfg = data;
> >>  		break;
> >>
> >> base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
> > 
> > Relevant dmesg output with some context below. VM locked up as expected.
> > 
> > [   85.834971] vfio-pci 0000:0c:00.0: resetting
> > [   85.937573] vfio-pci 0000:0c:00.0: reset done
> > [   86.494210] vfio-pci 0000:0c:00.0: resetting
> > [   86.494264] vfio-pci 0000:0c:00.1: resetting
> > [   86.761442] vfio-pci 0000:0c:00.0: reset done
> > [   86.761480] vfio-pci 0000:0c:00.1: reset done
> > [   86.762392] vfio-pci 0000:0c:00.0: resetting
> > [   86.865462] vfio-pci 0000:0c:00.0: reset done
> > [   86.977360] virbr0: port 1(vnet1) entered learning state
> > [   88.993052] virbr0: port 1(vnet1) entered forwarding state
> > [   88.993057] virbr0: topology change detected, propagating
> > [  103.459114] kvm_amd: DE_CFG current = 0, WRMSR = 2
> > [  161.442032] virbr0: port 1(vnet1) entered disabled state // VM shut down
> 
> That is the MSR_AMD64_DE_CFG_LFENCE_SERIALIZE bit. Yeah, that actually
> does change the behavior of LFENCE and isn't just a reflection of the
> hardware.
> 
> Linux does set that bit on boot, too (if LFENCE always serializing isn't
> advertised 8000_0021_EAX[2]), so I'm kind of surprised it didn't pop up
> there.

Linux may be running afoul of this, but it would only become visible if someone
checked dmesg.  Even the "unsafe" MSR accesses in Linux gracefully handle faults
these days, the only symptom would be a WARN.

> I imagine that the above CPUID bit isn't set, so an attempt is made to
> set the MSR bit.

Yep.  And LFENCE_RDTSC _is_ supported, otherwise the supported_de_cfg check would
have failed.  Which means it's a-ok for the guest to set the bit, i.e. KVM won't
let the guest incorrectly think it's running on CPU for which LFENCE is serializing.

Unless you (Tom) disagree, I vote to simply drop the offending code, i.e. make
all supported bits fully writable from the guest.  KVM is firmly in the wrong here,
and I can't think of any reason to disallow the guest from clearing LFENCE_SERIALIZE.

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6a350cee2f6c..5a82ead3bf0f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3201,15 +3201,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
                if (data & ~supported_de_cfg)
                        return 1;
 
-               /*
-                * Don't let the guest change the host-programmed value.  The
-                * MSR is very model specific, i.e. contains multiple bits that
-                * are completely unknown to KVM, and the one bit known to KVM
-                * is simply a reflection of hardware capabilities.
-                */
-               if (!msr->host_initiated && data != svm->msr_decfg)
-                       return 1;
-
                svm->msr_decfg = data;
                break;
        }


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] from 74a0e79df68a8042fb84fd7207e57b70722cf825: VFIO PCI passthrough no longer works
  2024-12-10 22:43       ` Sean Christopherson
@ 2024-12-11  8:54         ` Simon Pilkington
  2024-12-11 14:37         ` Tom Lendacky
  1 sibling, 0 replies; 9+ messages in thread
From: Simon Pilkington @ 2024-12-11  8:54 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-kernel, kvm, regressions, Tom Lendacky

On 10/12/2024 23:43, Sean Christopherson wrote:
> Unless you (Tom) disagree, I vote to simply drop the offending code, i.e. make
> all supported bits fully writable from the guest.  KVM is firmly in the wrong here,
> and I can't think of any reason to disallow the guest from clearing LFENCE_SERIALIZE.
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 6a350cee2f6c..5a82ead3bf0f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3201,15 +3201,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>                 if (data & ~supported_de_cfg)
>                         return 1;
>  
> -               /*
> -                * Don't let the guest change the host-programmed value.  The
> -                * MSR is very model specific, i.e. contains multiple bits that
> -                * are completely unknown to KVM, and the one bit known to KVM
> -                * is simply a reflection of hardware capabilities.
> -                */
> -               if (!msr->host_initiated && data != svm->msr_decfg)
> -                       return 1;
> -
>                 svm->msr_decfg = data;
>                 break;
>         }
> 

This also produces a good kernel.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] from 74a0e79df68a8042fb84fd7207e57b70722cf825: VFIO PCI passthrough no longer works
  2024-12-10 22:43       ` Sean Christopherson
  2024-12-11  8:54         ` Simon Pilkington
@ 2024-12-11 14:37         ` Tom Lendacky
  2024-12-11 17:42           ` Simon Pilkington
  1 sibling, 1 reply; 9+ messages in thread
From: Tom Lendacky @ 2024-12-11 14:37 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Simon Pilkington, linux-kernel, kvm, regressions

On 12/10/24 16:43, Sean Christopherson wrote:
> On Tue, Dec 10, 2024, Tom Lendacky wrote:
>> On 12/10/24 14:33, Simon Pilkington wrote:
>>> On 10/12/2024 16:47, Sean Christopherson wrote:
>>>> Can you run with the below to see what bits the guest is trying to set (or clear)?
>>>> We could get the same info via tracepoints, but this will likely be faster/easier.
>>>>
>>>> ---
>>>>  arch/x86/kvm/svm/svm.c | 12 +++++++++---
>>>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>>> index dd15cc635655..5144d0283c9d 100644
>>>> --- a/arch/x86/kvm/svm/svm.c
>>>> +++ b/arch/x86/kvm/svm/svm.c
>>>> @@ -3195,11 +3195,14 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>>>  	case MSR_AMD64_DE_CFG: {
>>>>  		u64 supported_de_cfg;
>>>>  
>>>> -		if (svm_get_feature_msr(ecx, &supported_de_cfg))
>>>> +		if (WARN_ON_ONCE(svm_get_feature_msr(ecx, &supported_de_cfg)))
>>>>  			return 1;
>>>>  
>>>> -		if (data & ~supported_de_cfg)
>>>> +		if (data & ~supported_de_cfg) {
>>>> +			pr_warn("DE_CFG supported = %llx, WRMSR = %llx\n",
>>>> +				supported_de_cfg, data);
>>>>  			return 1;
>>>> +		}
>>>>  
>>>>  		/*
>>>>  		 * Don't let the guest change the host-programmed value.  The
>>>> @@ -3207,8 +3210,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>>>  		 * are completely unknown to KVM, and the one bit known to KVM
>>>>  		 * is simply a reflection of hardware capabilities.
>>>>  		 */
>>>> -		if (!msr->host_initiated && data != svm->msr_decfg)
>>>> +		if (!msr->host_initiated && data != svm->msr_decfg) {
>>>> +			pr_warn("DE_CFG current = %llx, WRMSR = %llx\n",
>>>> +				svm->msr_decfg, data);
>>>>  			return 1;
>>>> +		}
>>>>  
>>>>  		svm->msr_decfg = data;
>>>>  		break;
>>>>
>>>> base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
>>>
>>> Relevant dmesg output with some context below. VM locked up as expected.
>>>
>>> [   85.834971] vfio-pci 0000:0c:00.0: resetting
>>> [   85.937573] vfio-pci 0000:0c:00.0: reset done
>>> [   86.494210] vfio-pci 0000:0c:00.0: resetting
>>> [   86.494264] vfio-pci 0000:0c:00.1: resetting
>>> [   86.761442] vfio-pci 0000:0c:00.0: reset done
>>> [   86.761480] vfio-pci 0000:0c:00.1: reset done
>>> [   86.762392] vfio-pci 0000:0c:00.0: resetting
>>> [   86.865462] vfio-pci 0000:0c:00.0: reset done
>>> [   86.977360] virbr0: port 1(vnet1) entered learning state
>>> [   88.993052] virbr0: port 1(vnet1) entered forwarding state
>>> [   88.993057] virbr0: topology change detected, propagating
>>> [  103.459114] kvm_amd: DE_CFG current = 0, WRMSR = 2
>>> [  161.442032] virbr0: port 1(vnet1) entered disabled state // VM shut down
>>
>> That is the MSR_AMD64_DE_CFG_LFENCE_SERIALIZE bit. Yeah, that actually
>> does change the behavior of LFENCE and isn't just a reflection of the
>> hardware.
>>
>> Linux does set that bit on boot, too (if LFENCE always serializing isn't
>> advertised 8000_0021_EAX[2]), so I'm kind of surprised it didn't pop up
>> there.
> 
> Linux may be running afoul of this, but it would only become visible if someone
> checked dmesg.  Even the "unsafe" MSR accesses in Linux gracefully handle faults
> these days, the only symptom would be a WARN.
> 
>> I imagine that the above CPUID bit isn't set, so an attempt is made to
>> set the MSR bit.
> 
> Yep.  And LFENCE_RDTSC _is_ supported, otherwise the supported_de_cfg check would
> have failed.  Which means it's a-ok for the guest to set the bit, i.e. KVM won't
> let the guest incorrectly think it's running on CPU for which LFENCE is serializing.
> 
> Unless you (Tom) disagree, I vote to simply drop the offending code, i.e. make
> all supported bits fully writable from the guest.  KVM is firmly in the wrong here,
> and I can't think of any reason to disallow the guest from clearing LFENCE_SERIALIZE.
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 6a350cee2f6c..5a82ead3bf0f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3201,15 +3201,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>                 if (data & ~supported_de_cfg)
>                         return 1;
>  
> -               /*
> -                * Don't let the guest change the host-programmed value.  The
> -                * MSR is very model specific, i.e. contains multiple bits that
> -                * are completely unknown to KVM, and the one bit known to KVM
> -                * is simply a reflection of hardware capabilities.
> -                */
> -               if (!msr->host_initiated && data != svm->msr_decfg)
> -                       return 1;
> -

That works for me.

Thanks,
Tom

>                 svm->msr_decfg = data;
>                 break;
>         }
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] from 74a0e79df68a8042fb84fd7207e57b70722cf825: VFIO PCI passthrough no longer works
  2024-12-11 14:37         ` Tom Lendacky
@ 2024-12-11 17:42           ` Simon Pilkington
  0 siblings, 0 replies; 9+ messages in thread
From: Simon Pilkington @ 2024-12-11 17:42 UTC (permalink / raw)
  To: Tom Lendacky, Sean Christopherson; +Cc: linux-kernel, kvm, regressions

On 11/12/2024 15:37, Tom Lendacky wrote:
> On 12/10/24 16:43, Sean Christopherson wrote:
>> On Tue, Dec 10, 2024, Tom Lendacky wrote:
>>> On 12/10/24 14:33, Simon Pilkington wrote:
>>>> On 10/12/2024 16:47, Sean Christopherson wrote:
>>>>> Can you run with the below to see what bits the guest is trying to set (or clear)?
>>>>> We could get the same info via tracepoints, but this will likely be faster/easier.
>>>>>
>>>>> ---
>>>>>  arch/x86/kvm/svm/svm.c | 12 +++++++++---
>>>>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>>>> index dd15cc635655..5144d0283c9d 100644
>>>>> --- a/arch/x86/kvm/svm/svm.c
>>>>> +++ b/arch/x86/kvm/svm/svm.c
>>>>> @@ -3195,11 +3195,14 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>>>>  	case MSR_AMD64_DE_CFG: {
>>>>>  		u64 supported_de_cfg;
>>>>>  
>>>>> -		if (svm_get_feature_msr(ecx, &supported_de_cfg))
>>>>> +		if (WARN_ON_ONCE(svm_get_feature_msr(ecx, &supported_de_cfg)))
>>>>>  			return 1;
>>>>>  
>>>>> -		if (data & ~supported_de_cfg)
>>>>> +		if (data & ~supported_de_cfg) {
>>>>> +			pr_warn("DE_CFG supported = %llx, WRMSR = %llx\n",
>>>>> +				supported_de_cfg, data);
>>>>>  			return 1;
>>>>> +		}
>>>>>  
>>>>>  		/*
>>>>>  		 * Don't let the guest change the host-programmed value.  The
>>>>> @@ -3207,8 +3210,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>>>>  		 * are completely unknown to KVM, and the one bit known to KVM
>>>>>  		 * is simply a reflection of hardware capabilities.
>>>>>  		 */
>>>>> -		if (!msr->host_initiated && data != svm->msr_decfg)
>>>>> +		if (!msr->host_initiated && data != svm->msr_decfg) {
>>>>> +			pr_warn("DE_CFG current = %llx, WRMSR = %llx\n",
>>>>> +				svm->msr_decfg, data);
>>>>>  			return 1;
>>>>> +		}
>>>>>  
>>>>>  		svm->msr_decfg = data;
>>>>>  		break;
>>>>>
>>>>> base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
>>>>
>>>> Relevant dmesg output with some context below. VM locked up as expected.
>>>>
>>>> [   85.834971] vfio-pci 0000:0c:00.0: resetting
>>>> [   85.937573] vfio-pci 0000:0c:00.0: reset done
>>>> [   86.494210] vfio-pci 0000:0c:00.0: resetting
>>>> [   86.494264] vfio-pci 0000:0c:00.1: resetting
>>>> [   86.761442] vfio-pci 0000:0c:00.0: reset done
>>>> [   86.761480] vfio-pci 0000:0c:00.1: reset done
>>>> [   86.762392] vfio-pci 0000:0c:00.0: resetting
>>>> [   86.865462] vfio-pci 0000:0c:00.0: reset done
>>>> [   86.977360] virbr0: port 1(vnet1) entered learning state
>>>> [   88.993052] virbr0: port 1(vnet1) entered forwarding state
>>>> [   88.993057] virbr0: topology change detected, propagating
>>>> [  103.459114] kvm_amd: DE_CFG current = 0, WRMSR = 2
>>>> [  161.442032] virbr0: port 1(vnet1) entered disabled state // VM shut down
>>>
>>> That is the MSR_AMD64_DE_CFG_LFENCE_SERIALIZE bit. Yeah, that actually
>>> does change the behavior of LFENCE and isn't just a reflection of the
>>> hardware.
>>>
>>> Linux does set that bit on boot, too (if LFENCE always serializing isn't
>>> advertised 8000_0021_EAX[2]), so I'm kind of surprised it didn't pop up
>>> there.
>>
>> Linux may be running afoul of this, but it would only become visible if someone
>> checked dmesg.  Even the "unsafe" MSR accesses in Linux gracefully handle faults
>> these days, the only symptom would be a WARN.
>>
>>> I imagine that the above CPUID bit isn't set, so an attempt is made to
>>> set the MSR bit.
>>
>> Yep.  And LFENCE_RDTSC _is_ supported, otherwise the supported_de_cfg check would
>> have failed.  Which means it's a-ok for the guest to set the bit, i.e. KVM won't
>> let the guest incorrectly think it's running on CPU for which LFENCE is serializing.
>>
>> Unless you (Tom) disagree, I vote to simply drop the offending code, i.e. make
>> all supported bits fully writable from the guest.  KVM is firmly in the wrong here,
>> and I can't think of any reason to disallow the guest from clearing LFENCE_SERIALIZE.
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 6a350cee2f6c..5a82ead3bf0f 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3201,15 +3201,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>                 if (data & ~supported_de_cfg)
>>                         return 1;
>>  
>> -               /*
>> -                * Don't let the guest change the host-programmed value.  The
>> -                * MSR is very model specific, i.e. contains multiple bits that
>> -                * are completely unknown to KVM, and the one bit known to KVM
>> -                * is simply a reflection of hardware capabilities.
>> -                */
>> -               if (!msr->host_initiated && data != svm->msr_decfg)
>> -                       return 1;
>> -
> 
> That works for me.
> 
> Thanks,
> Tom
> 
>>                 svm->msr_decfg = data;
>>                 break;
>>         }
>>

Thanks for the prompt response on this Sean & Tom.

Regards,
Simon

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-12-11 17:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-10  9:04 [REGRESSION] from 74a0e79df68a8042fb84fd7207e57b70722cf825: VFIO PCI passthrough no longer works Simon Pilkington
2024-12-10 15:47 ` Sean Christopherson
2024-12-10 16:15   ` Tom Lendacky
2024-12-10 20:33   ` Simon Pilkington
2024-12-10 21:06     ` Tom Lendacky
2024-12-10 22:43       ` Sean Christopherson
2024-12-11  8:54         ` Simon Pilkington
2024-12-11 14:37         ` Tom Lendacky
2024-12-11 17:42           ` Simon Pilkington

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox