All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Sean Christopherson <seanjc@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "Rafael J. Wysocki" <rafael@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Kuppuswamy Sathyanarayanan
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Elena Reshetova <elena.reshetova@intel.com>,
	Jun Nakajima <jun.nakajima@intel.com>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Ashish Kalra <ashish.kalra@amd.com>,
	Kai Huang <kai.huang@intel.com>, Baoquan He <bhe@redhat.com>,
	kexec@lists.infradead.org, linux-coco@lists.linux.dev,
	linux-kernel@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>
Subject: Re: [PATCHv2 04/13] x86/kvm: Do not try to disable kvmclock if it was not enabled
Date: Mon, 23 Oct 2023 10:45:30 +0200	[thread overview]
Message-ID: <87r0ll4thx.fsf@redhat.com> (raw)
In-Reply-To: <ZTKg9XMxeBQ36f5L@google.com>

Sean Christopherson <seanjc@google.com> writes:

> On Fri, Oct 20, 2023, Vitaly Kuznetsov wrote:
>> > ---
>> >  arch/x86/kernel/kvmclock.c | 12 ++++++++----
>> >  1 file changed, 8 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
>> > index fb8f52149be9..f2fff625576d 100644
>> > --- a/arch/x86/kernel/kvmclock.c
>> > +++ b/arch/x86/kernel/kvmclock.c
>> > @@ -24,8 +24,8 @@
>> >  
>> >  static int kvmclock __initdata = 1;
>> >  static int kvmclock_vsyscall __initdata = 1;
>> > -static int msr_kvm_system_time __ro_after_init = MSR_KVM_SYSTEM_TIME;
>> > -static int msr_kvm_wall_clock __ro_after_init = MSR_KVM_WALL_CLOCK;
>> > +static int msr_kvm_system_time __ro_after_init;
>> > +static int msr_kvm_wall_clock __ro_after_init;
>> >  static u64 kvm_sched_clock_offset __ro_after_init;
>> >  
>> >  static int __init parse_no_kvmclock(char *arg)
>> > @@ -195,7 +195,8 @@ static void kvm_setup_secondary_clock(void)
>> >  
>> >  void kvmclock_disable(void)
>> >  {
>> > -	native_write_msr(msr_kvm_system_time, 0, 0);
>> > +	if (msr_kvm_system_time)
>> > +		native_write_msr(msr_kvm_system_time, 0, 0);
>> >  }
>> >  
>> >  static void __init kvmclock_init_mem(void)
>> > @@ -294,7 +295,10 @@ void __init kvmclock_init(void)
>> >  	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
>> >  		msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
>> >  		msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
>> > -	} else if (!kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
>> > +	} else if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
>> > +		msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
>> > +		msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
>> > +	} else {
>> >  		return;
>> >  	}
>> 
>> This should work, so
>> 
>> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> 
>> but my personal preference would be to change kvm_guest_cpu_offline()
>> to check KVM features explicitly instead of checking MSRs against '0'
>> at least becase it already does so for other features. Completely
>> untested:
>> 
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> index b8ab9ee5896c..1ee49c98e70a 100644
>> --- a/arch/x86/kernel/kvm.c
>> +++ b/arch/x86/kernel/kvm.c
>> @@ -454,7 +454,9 @@ static void kvm_guest_cpu_offline(bool shutdown)
>>         kvm_pv_disable_apf();
>>         if (!shutdown)
>>                 apf_task_wake_all();
>> -       kvmclock_disable();
>> +       if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2) ||
>> +           kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE))
>> +               kvmclock_disable();
>>  }
>
> That would result in an unnecessray WRMSR in the case where kvmclock is disabled
> on the command line.  It _should_ be benign given how the code is written, but
> it's not impossible to imagine a scenario where someone disabled kvmclock in the
> guest because of a hypervisor bug.  And the WRMSR would become a bogus write to
> MSR 0x0 if someone made a "cleanup" to set msr_kvm_system_time if and only if
> kvmclock is actually used, e.g. if someone made Kirill's change sans the check in
> kvmclock_disable().

True but we don't have such module params to disable other PV features so
e.g. KVM_FEATURE_PV_EOI/KVM_FEATURE_MIGRATION_CONTROL are written to
unconditionally. Wouldn't it be better to handle parameters like
'no-kvmclock' by clearing the feature bit in kvm_arch_para_features()'s
return value so all kvm_para_has_feature() calls for it just return
'false'? We can even do an umbreall "no-kvm-features=<mask>" to cover
all possible debug cases.

-- 
Vitaly


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Sean Christopherson <seanjc@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "Rafael J. Wysocki" <rafael@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Kuppuswamy Sathyanarayanan
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Elena Reshetova <elena.reshetova@intel.com>,
	Jun Nakajima <jun.nakajima@intel.com>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Ashish Kalra <ashish.kalra@amd.com>,
	Kai Huang <kai.huang@intel.com>, Baoquan He <bhe@redhat.com>,
	kexec@lists.infradead.org, linux-coco@lists.linux.dev,
	linux-kernel@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>
Subject: Re: [PATCHv2 04/13] x86/kvm: Do not try to disable kvmclock if it was not enabled
Date: Mon, 23 Oct 2023 10:45:30 +0200	[thread overview]
Message-ID: <87r0ll4thx.fsf@redhat.com> (raw)
In-Reply-To: <ZTKg9XMxeBQ36f5L@google.com>

Sean Christopherson <seanjc@google.com> writes:

> On Fri, Oct 20, 2023, Vitaly Kuznetsov wrote:
>> > ---
>> >  arch/x86/kernel/kvmclock.c | 12 ++++++++----
>> >  1 file changed, 8 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
>> > index fb8f52149be9..f2fff625576d 100644
>> > --- a/arch/x86/kernel/kvmclock.c
>> > +++ b/arch/x86/kernel/kvmclock.c
>> > @@ -24,8 +24,8 @@
>> >  
>> >  static int kvmclock __initdata = 1;
>> >  static int kvmclock_vsyscall __initdata = 1;
>> > -static int msr_kvm_system_time __ro_after_init = MSR_KVM_SYSTEM_TIME;
>> > -static int msr_kvm_wall_clock __ro_after_init = MSR_KVM_WALL_CLOCK;
>> > +static int msr_kvm_system_time __ro_after_init;
>> > +static int msr_kvm_wall_clock __ro_after_init;
>> >  static u64 kvm_sched_clock_offset __ro_after_init;
>> >  
>> >  static int __init parse_no_kvmclock(char *arg)
>> > @@ -195,7 +195,8 @@ static void kvm_setup_secondary_clock(void)
>> >  
>> >  void kvmclock_disable(void)
>> >  {
>> > -	native_write_msr(msr_kvm_system_time, 0, 0);
>> > +	if (msr_kvm_system_time)
>> > +		native_write_msr(msr_kvm_system_time, 0, 0);
>> >  }
>> >  
>> >  static void __init kvmclock_init_mem(void)
>> > @@ -294,7 +295,10 @@ void __init kvmclock_init(void)
>> >  	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
>> >  		msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
>> >  		msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
>> > -	} else if (!kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
>> > +	} else if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
>> > +		msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
>> > +		msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
>> > +	} else {
>> >  		return;
>> >  	}
>> 
>> This should work, so
>> 
>> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> 
>> but my personal preference would be to change kvm_guest_cpu_offline()
>> to check KVM features explicitly instead of checking MSRs against '0'
>> at least becase it already does so for other features. Completely
>> untested:
>> 
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> index b8ab9ee5896c..1ee49c98e70a 100644
>> --- a/arch/x86/kernel/kvm.c
>> +++ b/arch/x86/kernel/kvm.c
>> @@ -454,7 +454,9 @@ static void kvm_guest_cpu_offline(bool shutdown)
>>         kvm_pv_disable_apf();
>>         if (!shutdown)
>>                 apf_task_wake_all();
>> -       kvmclock_disable();
>> +       if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2) ||
>> +           kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE))
>> +               kvmclock_disable();
>>  }
>
> That would result in an unnecessray WRMSR in the case where kvmclock is disabled
> on the command line.  It _should_ be benign given how the code is written, but
> it's not impossible to imagine a scenario where someone disabled kvmclock in the
> guest because of a hypervisor bug.  And the WRMSR would become a bogus write to
> MSR 0x0 if someone made a "cleanup" to set msr_kvm_system_time if and only if
> kvmclock is actually used, e.g. if someone made Kirill's change sans the check in
> kvmclock_disable().

True but we don't have such module params to disable other PV features so
e.g. KVM_FEATURE_PV_EOI/KVM_FEATURE_MIGRATION_CONTROL are written to
unconditionally. Wouldn't it be better to handle parameters like
'no-kvmclock' by clearing the feature bit in kvm_arch_para_features()'s
return value so all kvm_para_has_feature() calls for it just return
'false'? We can even do an umbreall "no-kvm-features=<mask>" to cover
all possible debug cases.

-- 
Vitaly


  reply	other threads:[~2023-10-23  8:45 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-20 15:12 [PATCHv2 00/13] x86/tdx: Add kexec support Kirill A. Shutemov
2023-10-20 15:12 ` Kirill A. Shutemov
2023-10-20 15:12 ` [PATCHv2 01/13] x86/acpi: Extract ACPI MADT wakeup code into a separate file Kirill A. Shutemov
2023-10-20 15:12   ` Kirill A. Shutemov
2023-10-20 17:12   ` Kuppuswamy Sathyanarayanan
2023-10-20 17:12     ` Kuppuswamy Sathyanarayanan
2023-10-20 15:12 ` [PATCHv2 02/13] kernel/cpu: Add support for declaring CPU offlining not supported Kirill A. Shutemov
2023-10-20 15:12   ` Kirill A. Shutemov
2023-10-23  9:30   ` Huang, Kai
2023-10-23  9:30     ` Huang, Kai
2023-10-23 15:31     ` kirill.shutemov
2023-10-23 15:31       ` kirill.shutemov
2023-10-23 22:07       ` Huang, Kai
2023-10-23 22:07         ` Huang, Kai
2023-10-28 14:07         ` Thomas Gleixner
2023-10-28 14:07           ` Thomas Gleixner
2023-10-28 14:12   ` Thomas Gleixner
2023-10-28 14:12     ` Thomas Gleixner
2023-10-29 14:23     ` Kirill A. Shutemov
2023-10-29 14:23       ` Kirill A. Shutemov
2023-10-20 15:12 ` [PATCHv2 03/13] cpu/hotplug, x86/acpi: Disable CPU offlining for ACPI MADT wakeup Kirill A. Shutemov
2023-10-20 15:12   ` Kirill A. Shutemov
2023-10-20 15:12 ` [PATCHv2 04/13] x86/kvm: Do not try to disable kvmclock if it was not enabled Kirill A. Shutemov
2023-10-20 15:12   ` Kirill A. Shutemov
2023-10-20 15:32   ` Sean Christopherson
2023-10-20 15:32     ` Sean Christopherson
2023-10-20 15:41   ` Vitaly Kuznetsov
2023-10-20 15:41     ` Vitaly Kuznetsov
2023-10-20 17:07     ` Sean Christopherson
2023-10-20 17:07       ` Sean Christopherson
2023-10-23  8:45       ` Vitaly Kuznetsov [this message]
2023-10-23  8:45         ` Vitaly Kuznetsov
2023-10-23 14:40         ` Sean Christopherson
2023-10-23 14:40           ` Sean Christopherson
2023-10-20 15:12 ` [PATCHv2 05/13] x86/kexec: Keep CR4.MCE set during kexec for TDX guest Kirill A. Shutemov
2023-10-20 15:12   ` Kirill A. Shutemov
2023-10-20 15:12 ` [PATCHv2 06/13] x86/mm: Make x86_platform.guest.enc_status_change_*() return errno Kirill A. Shutemov
2023-10-20 15:12   ` Kirill A. Shutemov
2023-10-20 15:12 ` [PATCHv2 07/13] x86/mm: Return correct level from lookup_address() if pte is none Kirill A. Shutemov
2023-10-20 15:12   ` Kirill A. Shutemov
2023-10-20 15:12 ` [PATCHv2 08/13] x86/tdx: Account shared memory Kirill A. Shutemov
2023-10-20 15:12   ` Kirill A. Shutemov
2023-10-20 15:12 ` [PATCHv2 09/13] x86/tdx: Convert shared memory back to private on kexec Kirill A. Shutemov
2023-10-20 15:12   ` Kirill A. Shutemov
2023-10-20 15:12 ` [PATCHv2 10/13] x86/mm: Make e820_end_ram_pfn() cover E820_TYPE_ACPI ranges Kirill A. Shutemov
2023-10-20 15:12   ` Kirill A. Shutemov
2023-10-20 15:12 ` [PATCHv2 11/13] x86/acpi: Do not attempt to bring up secondary CPUs in kexec case Kirill A. Shutemov
2023-10-20 15:12   ` Kirill A. Shutemov
2023-10-24 10:14   ` Huang, Kai
2023-10-24 10:14     ` Huang, Kai
2023-10-24 13:59   ` Kuppuswamy Sathyanarayanan
2023-10-24 13:59     ` Kuppuswamy Sathyanarayanan
2023-10-27 13:01     ` Kirill A. Shutemov
2023-10-20 15:12 ` [PATCHv2 12/13] x86/acpi: Rename fields in acpi_madt_multiproc_wakeup structure Kirill A. Shutemov
2023-10-20 15:12   ` Kirill A. Shutemov
2023-10-24 10:18   ` Huang, Kai
2023-10-24 10:18     ` Huang, Kai
2023-10-24 12:46   ` Kuppuswamy Sathyanarayanan
2023-10-24 12:46     ` Kuppuswamy Sathyanarayanan
2023-10-20 15:12 ` [PATCHv2 13/13] x86/acpi: Add support for CPU offlining for ACPI MADT wakeup method Kirill A. Shutemov
2023-10-20 15:12   ` Kirill A. Shutemov
2023-10-24 10:11   ` Huang, Kai
2023-10-24 10:11     ` Huang, Kai
2023-10-25  3:50     ` Huang, Kai
2023-10-25  3:50       ` Huang, Kai
2023-10-27 11:58     ` kirill.shutemov
2023-10-27 11:58       ` kirill.shutemov
2023-10-29 17:31   ` Thomas Gleixner
2023-10-29 17:31     ` Thomas Gleixner
2023-11-01 13:26     ` Kirill A. Shutemov
2023-11-01 13:26       ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r0ll4thx.fsf@redhat.com \
    --to=vkuznets@redhat.com \
    --cc=adrian.hunter@intel.com \
    --cc=ashish.kalra@amd.com \
    --cc=bhe@redhat.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=elena.reshetova@intel.com \
    --cc=jun.nakajima@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kexec@lists.infradead.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.