* Re: [PATCH v2] PCI: PM: Move to D0 before calling pci_legacy_resume_early()
From: Bjorn Helgaas @ 2019-10-07 13:24 UTC (permalink / raw)
To: Dexuan Cui
Cc: lorenzo.pieralisi@arm.com, linux-pci@vger.kernel.org,
Michael Kelley, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org,
driverdev-devel@linuxdriverproject.org, Sasha Levin,
Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
Stephen Hemminger, jackm@mellanox.com
In-Reply-To: <KU1P153MB016637CAEAD346F0AA8E3801BFAD0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COM>
On Wed, Aug 14, 2019 at 01:06:55AM +0000, Dexuan Cui wrote:
>
> In pci_legacy_suspend_late(), the device state is moved to PCI_UNKNOWN.
>
> In pci_pm_thaw_noirq(), the state is supposed to be moved back to PCI_D0,
> but the current code misses the pci_legacy_resume_early() path, so the
> state remains in PCI_UNKNOWN in that path. As a result, in the resume
> phase of hibernation, this causes an error for the Mellanox VF driver,
> which fails to enable MSI-X because pci_msi_supported() is false due
> to dev->current_state != PCI_D0:
>
> mlx4_core a6d1:00:02.0: Detected virtual function - running in slave mode
> mlx4_core a6d1:00:02.0: Sending reset
> mlx4_core a6d1:00:02.0: Sending vhcr0
> mlx4_core a6d1:00:02.0: HCA minimum page size:512
> mlx4_core a6d1:00:02.0: Timestamping is not supported in slave mode
> mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting
> PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
> PM: Device a6d1:00:02.0 failed to thaw: error -95
>
> To be more accurate, the "resume" phase means the "thaw" callbacks which
> run before the system enters hibernation: when the user runs the command
> "echo disk > /sys/power/state" for hibernation, first the kernel "freezes"
> all the devices and creates a hibernation image, then the kernel "thaws"
> the devices including the disk/NIC, writes the memory to the disk, and
> powers down. This patch fixes the error message for the Mellanox VF driver
> in this phase.
>
> When the system starts again, a fresh kernel starts to run, and when the
> kernel detects that a hibernation image was saved, the kernel "quiesces"
> the devices, and then "restores" the devices from the saved image. In this
> path:
> device_resume_noirq() -> ... ->
> pci_pm_restore_noirq() ->
> pci_pm_default_resume_early() ->
> pci_power_up() moves the device states back to PCI_D0. This path is
> not broken and doesn't need my patch.
>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
This looks like a bugfix for 5839ee7389e8 ("PCI / PM: Force devices to
D0 in pci_pm_thaw_noirq()") so maybe it should be marked for stable as
5839ee7389e8 was?
Rafael, could you confirm?
> ---
>
> changes in v2:
> Updated the changelog with more details.
>
> drivers/pci/pci-driver.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 36dbe960306b..27dfc68db9e7 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1074,15 +1074,16 @@ static int pci_pm_thaw_noirq(struct device *dev)
> return error;
> }
>
> - if (pci_has_legacy_pm_support(pci_dev))
> - return pci_legacy_resume_early(dev);
> -
> /*
> * pci_restore_state() requires the device to be in D0 (because of MSI
> * restoration among other things), so force it into D0 in case the
> * driver's "freeze" callbacks put it into a low-power state directly.
> */
> pci_set_power_state(pci_dev, PCI_D0);
> +
> + if (pci_has_legacy_pm_support(pci_dev))
> + return pci_legacy_resume_early(dev);
> +
> pci_restore_state(pci_dev);
>
> if (drv && drv->pm && drv->pm->thaw_noirq)
> --
> 2.19.1
>
^ permalink raw reply
* [PATCH 5.2 098/137] PCI: pci-hyperv: Fix build errors on non-SYSFS config
From: Greg Kroah-Hartman @ 2019-10-06 17:21 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Randy Dunlap, Lorenzo Pieralisi,
Haiyang Zhang, Matthew Wilcox, Jake Oshins, K. Y. Srinivasan,
Stephen Hemminger, Stephen Hemminger, Sasha Levin, Bjorn Helgaas,
linux-pci, linux-hyperv, Dexuan Cui
In-Reply-To: <20191006171209.403038733@linuxfoundation.org>
From: Randy Dunlap <rdunlap@infradead.org>
[ Upstream commit f58ba5e3f6863ea4486952698898848a6db726c2 ]
Fix build errors when building almost-allmodconfig but with SYSFS
not set (not enabled). Fixes these build errors:
ERROR: "pci_destroy_slot" [drivers/pci/controller/pci-hyperv.ko] undefined!
ERROR: "pci_create_slot" [drivers/pci/controller/pci-hyperv.ko] undefined!
drivers/pci/slot.o is only built when SYSFS is enabled, so
pci-hyperv.o has an implicit dependency on SYSFS.
Make that explicit.
Also, depending on X86 && X86_64 is not needed, so just change that
to depend on X86_64.
Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot information")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jake Oshins <jakeo@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Cc: linux-hyperv@vger.kernel.org
Cc: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/pci/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 2ab92409210af..297bf928d6522 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -181,7 +181,7 @@ config PCI_LABEL
config PCI_HYPERV
tristate "Hyper-V PCI Frontend"
- depends on X86 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && X86_64
+ depends on X86_64 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && SYSFS
help
The PCI device frontend driver allows the kernel to import arbitrary
PCI devices from a PCI backend to support PCI driver domains.
--
2.20.1
^ permalink raw reply related
* [PATCH 5.3 091/166] PCI: pci-hyperv: Fix build errors on non-SYSFS config
From: Greg Kroah-Hartman @ 2019-10-06 17:20 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Randy Dunlap, Lorenzo Pieralisi,
Haiyang Zhang, Matthew Wilcox, Jake Oshins, K. Y. Srinivasan,
Stephen Hemminger, Stephen Hemminger, Sasha Levin, Bjorn Helgaas,
linux-pci, linux-hyperv, Dexuan Cui
In-Reply-To: <20191006171212.850660298@linuxfoundation.org>
From: Randy Dunlap <rdunlap@infradead.org>
[ Upstream commit f58ba5e3f6863ea4486952698898848a6db726c2 ]
Fix build errors when building almost-allmodconfig but with SYSFS
not set (not enabled). Fixes these build errors:
ERROR: "pci_destroy_slot" [drivers/pci/controller/pci-hyperv.ko] undefined!
ERROR: "pci_create_slot" [drivers/pci/controller/pci-hyperv.ko] undefined!
drivers/pci/slot.o is only built when SYSFS is enabled, so
pci-hyperv.o has an implicit dependency on SYSFS.
Make that explicit.
Also, depending on X86 && X86_64 is not needed, so just change that
to depend on X86_64.
Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot information")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jake Oshins <jakeo@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Cc: linux-hyperv@vger.kernel.org
Cc: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/pci/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 2ab92409210af..297bf928d6522 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -181,7 +181,7 @@ config PCI_LABEL
config PCI_HYPERV
tristate "Hyper-V PCI Frontend"
- depends on X86 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && X86_64
+ depends on X86_64 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && SYSFS
help
The PCI device frontend driver allows the kernel to import arbitrary
PCI devices from a PCI backend to support PCI driver domains.
--
2.20.1
^ permalink raw reply related
* Re: [PATCH v4 3/4] xen: Mark "xen_nopvspin" parameter obsolete
From: Zhenzhong Duan @ 2019-10-06 7:52 UTC (permalink / raw)
To: Boris Ostrovsky, linux-kernel
Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
wanpengli, jmattson, joro, jgross, sstabellini, peterz,
Jonathan Corbet, H. Peter Anvin
In-Reply-To: <2c644c4a-f562-3271-ce0b-e60a44d82d89@oracle.com>
On 2019/10/4 22:57, Boris Ostrovsky wrote:
> On 10/3/19 10:02 AM, Zhenzhong Duan wrote:
>> Map "xen_nopvspin" to "nopvspin", fix stale description of "xen_nopvspin"
>> as we use qspinlock now.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
>> Cc: Jonathan Corbet <corbet@lwn.net>
>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Cc: Juergen Gross <jgross@suse.com>
>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>
> with a small nit
>
>> void __init xen_init_spinlocks(void)
>> {
>> + if (nopvspin)
>> + xen_pvspin = false;
>>
>> /* Don't need to use pvqspinlock code if there is only 1 vCPU. */
>> if (num_possible_cpus() == 1)
> I'd fold the change into this 'if' statement, I think it will still be
> clear what the comment refers to.
Good suggestion, will do that. Thanks
Zhenzhong
^ permalink raw reply
* Re: [PATCH v4 1/4] x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
From: Zhenzhong Duan @ 2019-10-06 7:49 UTC (permalink / raw)
To: Boris Ostrovsky, linux-kernel
Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
wanpengli, jmattson, joro, jgross, sstabellini, peterz,
Jonathan Corbet, H. Peter Anvin, Will Deacon
In-Reply-To: <26ef7beb-dad0-13c9-fc2f-217a5e046e4d@oracle.com>
On 2019/10/4 22:52, Boris Ostrovsky wrote:
> On 10/3/19 10:02 AM, Zhenzhong Duan wrote:
>> void __init kvm_spinlock_init(void)
>> {
>> - /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
>> - if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
>> - return;
>> -
>> - if (kvm_para_has_hint(KVM_HINTS_REALTIME))
>> + /*
>> + * Don't use the pvqspinlock code if no KVM_FEATURE_PV_UNHALT feature
>> + * support, or there is REALTIME hints or only 1 vCPU.
>> + */
>> + if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT) ||
>> + kvm_para_has_hint(KVM_HINTS_REALTIME) ||
>> + num_possible_cpus() == 1) {
>> + pr_info("PV spinlocks disabled\n");
>> return;
>> + }
>>
>> - /* Don't use the pvqspinlock code if there is only 1 vCPU. */
>> - if (num_possible_cpus() == 1)
>> + if (nopvspin) {
>> + pr_info("PV spinlocks disabled forced by \"nopvspin\" parameter.\n");
>> + static_branch_disable(&virt_spin_lock_key);
> Would it make sense to bring here the other site where the key is
> disabled (in kvm_smp_prepare_cpus())?
Thanks for point out, I'll do it. Just not clear if I should do that in a separate patch,
there is a history about that code:
Its original place was here and then moved to kvm_smp_prepare_cpus() by below commit:
34226b6b ("KVM: X86: Fix setup the virt_spin_lock_key before static key get initialized")
which fixed jump_label_init() calling late issue.
Then 8990cac6 ("x86/jump_label: Initialize static branching early") move jump_label_init()
early, so commit 34226b6b could be reverted.
>
> (and, in fact, shouldn't all of the checks that result in early return
> above disable the key?)
I think we should enable he key for !kvm_para_has_feature(KVM_FEATURE_PV_UNHALT) case,
there is lock holder preemption issue as qspinlock is fair lock, virt_spin_lock()
is an optimization to that, imaging one pcpu running 10 vcpus of same guest
contending a same lock.
For kvm_para_has_hint(KVM_HINTS_REALTIME) case, hypervisor hints there is
no preemption and we should disable virt_spin_lock_key to use native qspinlock.
For the UP case, we don't care virt_spin_lock_key value.
For nopvspin case, we intentionally check native qspinlock code performance,
compare it with PV qspinlock, etc. So virt_spin_lock() optimization should be disabled.
Let me know if anything wrong with above understanding. Thanks
Zhenzhong
^ permalink raw reply
* RE: [PATCH v4 4/4] x86/hyperv: Mark "hv_nopvspin" parameter obsolete
From: Michael Kelley @ 2019-10-05 17:07 UTC (permalink / raw)
To: Zhenzhong Duan, linux-kernel@vger.kernel.org
Cc: vkuznets, linux-hyperv@vger.kernel.org, kvm@vger.kernel.org,
KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
sashal@kernel.org, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, pbonzini@redhat.com, rkrcmar@redhat.com,
sean.j.christopherson@intel.com, wanpengli@tencent.com,
jmattson@google.com, joro@8bytes.org, boris.ostrovsky@oracle.com,
jgross@suse.com, sstabellini@kernel.org, peterz@infradead.org,
Jonathan Corbet, H. Peter Anvin
In-Reply-To: <1570111335-12731-5-git-send-email-zhenzhong.duan@oracle.com>
From: Zhenzhong Duan <zhenzhong.duan@oracle.com> Sent: Thursday, October 3, 2019 7:02 AM
>
> Map "hv_nopvspin" to "nopvspin".
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Sasha Levin <sashal@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 6 +++++-
> arch/x86/hyperv/hv_spinlock.c | 4 ++++
> 2 files changed, 9 insertions(+), 1 deletion(-)
>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
^ permalink raw reply
* RE: [PATCH v2] x86/hyperv: make vapic support x2apic mode
From: Michael Kelley @ 2019-10-04 22:33 UTC (permalink / raw)
To: Roman Kagan
Cc: vkuznets, kvm@vger.kernel.org, Tianyu Lan, Joerg Roedel,
KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
x86@kernel.org, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <20191004091855.GA26970@rkaganb.sw.ru>
From: Roman Kagan <rkagan@virtuozzo.com> Sent: Friday, October 4, 2019 2:19 AM
>
> On Fri, Oct 04, 2019 at 03:01:51AM +0000, Michael Kelley wrote:
> > From: Roman Kagan <rkagan@virtuozzo.com> Sent: Thursday, October 3, 2019 5:53 AM
> > > >
> > > > AFAIU you're trying to mirror native_x2apic_icr_write() here but this is
> > > > different from what hv_apic_icr_write() does
> > > > (SET_APIC_DEST_FIELD(id)).
> > >
> > > Right. In xapic mode the ICR2 aka the high 4 bytes of ICR is programmed
> > > with the destination id in the highest byte; in x2apic mode the whole
> > > ICR2 is set to the 32bit destination id.
> > >
> > > > Is it actually correct? (I think you've tested this and it is but)
> > >
> > > As I wrote in the commit log, I haven't tested it in the sense that I
> > > ran a Linux guest in a Hyper-V VM exposing x2apic to the guest, because
> > > I didn't manage to configure it to do so. OTOH I did run a Windows
> > > guest in QEMU/KVM with hv_apic and x2apic enabled and saw it write
> > > destination ids unshifted to the ICR2 part of ICR, so I assume it's
> > > correct.
> > >
> > > > Michael, could you please shed some light here?
> > >
> > > Would be appreciated, indeed.
> > >
> >
> > The newest version of Hyper-V provides an x2apic in a guest VM when the
> > number of vCPUs in the VM is > 240. This version of Hyper-V is beginning
> > to be deployed in Azure to enable the M416v2 VM size, but the functionality
> > is not yet available for the on-premises version of Hyper-V. However, I can
> > test this configuration internally with the above patch -- give me a few days.
> >
> > An additional complication is that when running on Intel processors that offer
> > vAPIC functionality, the Hyper-V "hints" value does *not* recommend using the
> > MSR-based APIC accesses. In this case, memory-mapped access to the x2apic
> > registers is faster than the synthetic MSRs.
>
> I guess you mean "using regular x2apic MSRs compared to the synthetic
> MSRs".
Yes, of course you are correct.
> Indeed they do essentially the same thing, and there's no reason
> for one set of MSRs to be significantly faster than the other. However,
> hv_apic_eoi_write makes use of "apic assists" aka lazy EOI which is
> certainly a win, and I'm not sure if it works without hv_apic.
>
I've checked with the Hyper-V people and the presence of vAPIC makes
a difference. If vAPIC is present in the hardware:
1) Hyper-V does not set the HV_X64_APIC_ACCESS_RECOMMENDED flag
2) The architectural MSRs should be used instead of the Hyper-V
synthetic MSRs, as they are significantly faster. The architectural
MSRs do not cause a VMEXIT because they are handled entirely by
the vAPIC microcode in the CPU. The synthetic MSRs do cause a VMEXIT.
3) The lazy EOI functionality should not be used
If vAPIC is not present in the hardware:
1) Hyper-V will set HV_X64_APIC_ACCESS_RECOMMENDED
2) Either set of MSRs has about the same performance, but we
should use the synthetic MSRs.
3) The lazy EOI functionality has some value and should be used
The same will apply to the AMD AVIC in some Hyper-V updates that
are coming soon.
So I think your code makes sense given the above information. By
Monday I'll try to test it on a Hyper-V guest VM with x2APIC.
Michael
^ permalink raw reply
* RE: [PATCH 1/2] x86/hyperv: Allow guests to enable InvariantTSC
From: Michael Kelley @ 2019-10-04 22:05 UTC (permalink / raw)
To: vkuznets, Andrea Parri, linux-kernel@vger.kernel.org,
linux-hyperv@vger.kernel.org, x86@kernel.org
Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
Andrea Parri
In-Reply-To: <87k19k1mad.fsf@vitty.brq.redhat.com>
From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Friday, October 4, 2019 9:57 AM
>
> Andrea Parri <parri.andrea@gmail.com> writes:
>
> > If the hardware supports TSC scaling, Hyper-V will set bit 15 of the
> > HV_PARTITION_PRIVILEGE_MASK in guest VMs with a compatible Hyper-V
> > configuration version. Bit 15 corresponds to the
> > AccessTscInvariantControls privilege. If this privilege bit is set,
> > guests can access the HvSyntheticInvariantTscControl MSR: guests can
> > set bit 0 of this synthetic MSR to enable the InvariantTSC feature.
> > After setting the synthetic MSR, CPUID will enumerate support for
> > InvariantTSC.
>
> I tried getting more information from TLFS but as of 5.0C this feature
> is not described there. I'm really interested in why this additional
> interface is needed, e.g. why can't Hyper-V just set InvariantTSC
> unconditionally when TSC scaling is supported?
>
Yes, this is very new functionality that is not yet available in a released
version of Hyper-V. And as you know, the Hyper-V TLFS has gotten
woefully out-of-date. :-(
Your question is the same question I asked. The reason given by
Hyper-V is to take the more cautious approach of not "automatically"
giving VMs an InvariantTSC due to updating the underlying Hyper-V
version. Instead, guest VMs must have been explicitly coded to take
advantage of the new InvariantTSC feature. It's not clear to me how
much of this caution is driven by Windows guests vs. Linux or FreeBSD
guests, but it is what it is.
Having to explicitly enable the InvariantTSC does give the Linux code
the opportunity to be a bit cleaner by doing things like not marking
the TSC as unstable when the InvariantTSC feature is present, and to
mark the TSC as reliable so we don't try to do TSC synchronization
(which Hyper-V does not want guests to try to do).
Michael
^ permalink raw reply
* Re: [PATCH 1/2] x86/hyperv: Allow guests to enable InvariantTSC
From: Vitaly Kuznetsov @ 2019-10-04 16:57 UTC (permalink / raw)
To: Andrea Parri, linux-kernel, linux-hyperv, x86
Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
Michael Kelley, Andrea Parri
In-Reply-To: <20191003155200.22022-1-parri.andrea@gmail.com>
Andrea Parri <parri.andrea@gmail.com> writes:
> If the hardware supports TSC scaling, Hyper-V will set bit 15 of the
> HV_PARTITION_PRIVILEGE_MASK in guest VMs with a compatible Hyper-V
> configuration version. Bit 15 corresponds to the
> AccessTscInvariantControls privilege. If this privilege bit is set,
> guests can access the HvSyntheticInvariantTscControl MSR: guests can
> set bit 0 of this synthetic MSR to enable the InvariantTSC feature.
> After setting the synthetic MSR, CPUID will enumerate support for
> InvariantTSC.
I tried getting more information from TLFS but as of 5.0C this feature
is not described there. I'm really interested in why this additional
interface is needed, e.g. why can't Hyper-V just set InvariantTSC
unconditionally when TSC scaling is supported?
>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> ---
> arch/x86/include/asm/hyperv-tlfs.h | 5 +++++
> arch/x86/kernel/cpu/mshyperv.c | 7 ++++++-
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 7741e211f7f51..5f10f7f2098db 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -86,6 +86,8 @@
> #define HV_X64_ACCESS_FREQUENCY_MSRS BIT(11)
> /* AccessReenlightenmentControls privilege */
> #define HV_X64_ACCESS_REENLIGHTENMENT BIT(13)
> +/* AccessTscInvariantControls privilege */
> +#define HV_X64_ACCESS_TSC_INVARIANT BIT(15)
>
> /*
> * Feature identification: indicates which flags were specified at partition
> @@ -278,6 +280,9 @@
> #define HV_X64_MSR_TSC_EMULATION_CONTROL 0x40000107
> #define HV_X64_MSR_TSC_EMULATION_STATUS 0x40000108
>
> +/* TSC invariant control */
> +#define HV_X64_MSR_TSC_INVARIANT_CONTROL 0x40000118
> +
> /*
> * Declare the MSR used to setup pages used to communicate with the hypervisor.
> */
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 267daad8c0360..105844d542e5c 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -286,7 +286,12 @@ static void __init ms_hyperv_init_platform(void)
> machine_ops.shutdown = hv_machine_shutdown;
> machine_ops.crash_shutdown = hv_machine_crash_shutdown;
> #endif
> - mark_tsc_unstable("running on Hyper-V");
> + if (ms_hyperv.features & HV_X64_ACCESS_TSC_INVARIANT) {
> + wrmsrl(HV_X64_MSR_TSC_INVARIANT_CONTROL, 0x1);
> + setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
> + } else {
> + mark_tsc_unstable("running on Hyper-V");
> + }
>
> /*
> * Generation 2 instances don't support reading the NMI status from
--
Vitaly
^ permalink raw reply
* Re: [PATCH] hv_sock: use HV_HYP_PAGE_SIZE instead of PAGE_SIZE_4K
From: Sasha Levin @ 2019-10-04 15:48 UTC (permalink / raw)
To: Michael Kelley
Cc: Himadri Pandya, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
davem@davemloft.net, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
himadri18.07
In-Reply-To: <MWHPR21MB078479F82BBA6D3E6527ECECD7DF0@MWHPR21MB0784.namprd21.prod.outlook.com>
On Wed, Jul 31, 2019 at 01:02:03AM +0000, Michael Kelley wrote:
>From: Himadri Pandya <himadrispandya@gmail.com> Sent: Wednesday, July 24, 2019 10:11 PM
>>
>> Older windows hosts require the hv_sock ring buffer to be defined
>> using 4K pages. This was achieved by using the symbol PAGE_SIZE_4K
>> defined specifically for this purpose. But now we have a new symbol
>> HV_HYP_PAGE_SIZE defined in hyperv-tlfs which can be used for this.
>>
>> This patch removes the definition of symbol PAGE_SIZE_4K and replaces
>> its usage with the symbol HV_HYP_PAGE_SIZE. This patch also aligns
>> sndbuf and rcvbuf to hyper-v specific page size using HV_HYP_PAGE_SIZE
>> instead of the guest page size(PAGE_SIZE) as hyper-v expects the page
>> size to be 4K and it might not be the case on ARM64 architecture.
>>
>> Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>
>> ---
>> net/vmw_vsock/hyperv_transport.c | 21 +++++++++++----------
>> 1 file changed, 11 insertions(+), 10 deletions(-)
>>
>> diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
>> index f2084e3f7aa4..ecb5d72d8010 100644
>> --- a/net/vmw_vsock/hyperv_transport.c
>> +++ b/net/vmw_vsock/hyperv_transport.c
>> @@ -13,15 +13,16 @@
>> #include <linux/hyperv.h>
>> #include <net/sock.h>
>> #include <net/af_vsock.h>
>> +#include <asm/hyperv-tlfs.h>
>>
>
>Reviewed-by: Michael Kelley <mikelley@microsoft.com>
>
>This patch depends on a prerequisite patch in
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/hyperv
>
>that defines HV_HYP_PAGE_SIZE.
David, the above prerequisite patch is now upstream, so this patch
should be good to go. Would you take it through the net tree or should I
do it via the hyperv tree?
--
Thanks,
Sasha
^ permalink raw reply
* Re: [PATCH] Drivers: hv: balloon: Remove dependencies on guest page size
From: Sasha Levin @ 2019-10-04 15:46 UTC (permalink / raw)
To: Michael Kelley
Cc: Himadri Pandya, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
tglx@linutronix.de, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, himadri18.07
In-Reply-To: <DM5PR21MB01379A15CEBBABFB0B165EDDD7B80@DM5PR21MB0137.namprd21.prod.outlook.com>
On Wed, Sep 04, 2019 at 11:37:12PM +0000, Michael Kelley wrote:
>From: Himadri Pandya <himadrispandya@gmail.com> Sent: Friday, August 16, 2019 9:09 PM
>>
>> Hyper-V assumes page size to be 4K. This might not be the case for
>> ARM64 architecture. Hence use hyper-v specific page size and page
>> shift definitions to avoid conflicts between different host and guest
>> page sizes on ARM64.
>>
>> Also, remove some old and incorrect comments and redefine ballooning
>> granularities to handle larger page sizes correctly.
>>
>> Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>
>> ---
>> drivers/hv/hv_balloon.c | 25 ++++++++++++-------------
>> 1 file changed, 12 insertions(+), 13 deletions(-)
>>
>
>Reviewed-by: Michael Kelley <mikelley@microsoft.com>
>
>Thomas -- can you pick up this patch in the x86/hyperv branch of your
>tip tree along with the other patches to fix wrong page size assumptions?
I've queued this for hyperv-next, thanks!
--
Thanks,
Sasha
^ permalink raw reply
* Re: [PATCH 0/2] Drivers: hv: Remove dependencies on guest page size
From: Sasha Levin @ 2019-10-04 15:41 UTC (permalink / raw)
To: Michael Kelley
Cc: Himadri Pandya, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com,
x86@kernel.org, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, himadri18.07
In-Reply-To: <DM5PR21MB01377F433CD767AF5E917EA6D7B80@DM5PR21MB0137.namprd21.prod.outlook.com>
On Wed, Sep 04, 2019 at 11:41:43PM +0000, Michael Kelley wrote:
>From: Himadri Pandya <himadrispandya@gmail.com>
>>
>> Hyper-V assumes page size to be 4KB. This might not be the case on ARM64
>> architecture. The first patch in this patchset introduces a hyer-v
>> specific function for allocating a zeroed page which can have a
>> different implementation on ARM64 to address the issue of different
>> guest and host page sizes. The second patch removes dependencies on
>> guest page size in vmbus by using hyper-v specific page symbol and
>> functions.
>>
>> Himadri Pandya (2):
>> x86: hv: Add function to allocate zeroed page for Hyper-V
>> Drivers: hv: vmbus: Remove dependencies on guest page size
>>
>> arch/x86/hyperv/hv_init.c | 8 ++++++++
>> arch/x86/include/asm/mshyperv.h | 1 +
>> drivers/hv/connection.c | 14 +++++++-------
>> drivers/hv/vmbus_drv.c | 6 +++---
>> 4 files changed, 19 insertions(+), 10 deletions(-)
>>
>> --
>> 2.17.1
>
>Thomas -- can you pick up this patch in the x86/hyperv branch of your
>tip tree along with the other patches to fix wrong page size assumptions?
I'll take it through the hyper-v tree, there's a bunch of similar work
queued up there already.
--
Thanks,
Sasha
^ permalink raw reply
* Re: [PATCH 0/2] Drivers: hv: Specify buffer size using Hyper-V page size
From: Sasha Levin @ 2019-10-04 15:29 UTC (permalink / raw)
To: Michael Kelley
Cc: Himadri Pandya, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
tglx@linutronix.de, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, himadri18.07
In-Reply-To: <DM5PR21MB01377E1E6DE541E902A12EEFD7B80@DM5PR21MB0137.namprd21.prod.outlook.com>
On Wed, Sep 04, 2019 at 11:40:21PM +0000, Michael Kelley wrote:
>From: Himadri Pandya <himadrispandya@gmail.com> Sent: Wednesday, July 24, 2019 10:03 PM
>>
>> recv_buffer and VMbus ring buffers are sized based on guest page size
>> which Hyper-V assumes to be 4KB. It might not be the case for some
>> architectures. Hence instead use the Hyper-V page size.
>>
>> Himadri Pandya (2):
>> Drivers: hv: Specify receive buffer size using Hyper-V page size
>> Drivers: hv: util: Specify ring buffer size using Hyper-V page size
>>
>> drivers/hv/hv_fcopy.c | 3 ++-
>> drivers/hv/hv_kvp.c | 3 ++-
>> drivers/hv/hv_snapshot.c | 3 ++-
>> drivers/hv/hv_util.c | 13 +++++++------
>> 4 files changed, 13 insertions(+), 9 deletions(-)
>>
>> --
>> 2.17.1
>
>Thomas -- can you pick up this patch set in the x86/hyperv branch
>of your tip tree along with the other patches to fix wrong page size
>assumptions?
I've queued these two for hyperv-next, thanks!
--
Thanks,
Sasha
^ permalink raw reply
* Re: [PATCH v4 3/4] xen: Mark "xen_nopvspin" parameter obsolete
From: Boris Ostrovsky @ 2019-10-04 14:57 UTC (permalink / raw)
To: Zhenzhong Duan, linux-kernel
Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
wanpengli, jmattson, joro, jgross, sstabellini, peterz,
Jonathan Corbet, H. Peter Anvin
In-Reply-To: <1570111335-12731-4-git-send-email-zhenzhong.duan@oracle.com>
On 10/3/19 10:02 AM, Zhenzhong Duan wrote:
> Map "xen_nopvspin" to "nopvspin", fix stale description of "xen_nopvspin"
> as we use qspinlock now.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
with a small nit
> void __init xen_init_spinlocks(void)
> {
> + if (nopvspin)
> + xen_pvspin = false;
>
> /* Don't need to use pvqspinlock code if there is only 1 vCPU. */
> if (num_possible_cpus() == 1)
I'd fold the change into this 'if' statement, I think it will still be
clear what the comment refers to.
-boris
^ permalink raw reply
* Re: [PATCH v4 1/4] x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
From: Boris Ostrovsky @ 2019-10-04 14:52 UTC (permalink / raw)
To: Zhenzhong Duan, linux-kernel
Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
wanpengli, jmattson, joro, jgross, sstabellini, peterz,
Jonathan Corbet, H. Peter Anvin, Will Deacon
In-Reply-To: <1570111335-12731-2-git-send-email-zhenzhong.duan@oracle.com>
On 10/3/19 10:02 AM, Zhenzhong Duan wrote:
> void __init kvm_spinlock_init(void)
> {
> - /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
> - if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
> - return;
> -
> - if (kvm_para_has_hint(KVM_HINTS_REALTIME))
> + /*
> + * Don't use the pvqspinlock code if no KVM_FEATURE_PV_UNHALT feature
> + * support, or there is REALTIME hints or only 1 vCPU.
> + */
> + if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT) ||
> + kvm_para_has_hint(KVM_HINTS_REALTIME) ||
> + num_possible_cpus() == 1) {
> + pr_info("PV spinlocks disabled\n");
> return;
> + }
>
> - /* Don't use the pvqspinlock code if there is only 1 vCPU. */
> - if (num_possible_cpus() == 1)
> + if (nopvspin) {
> + pr_info("PV spinlocks disabled forced by \"nopvspin\" parameter.\n");
> + static_branch_disable(&virt_spin_lock_key);
Would it make sense to bring here the other site where the key is
disabled (in kvm_smp_prepare_cpus())?
(and, in fact, shouldn't all of the checks that result in early return
above disable the key?)
-boris
> return;
> + }
> + pr_info("PV spinlocks enabled\n");
>
> __pv_init_lock_hash();
> pv_ops.lock.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
>
^ permalink raw reply
* Re: [PATCH v6 0/2] hv: vmbus: add fuzz testing to hv device
From: Sasha Levin @ 2019-10-04 14:46 UTC (permalink / raw)
To: Branden Bonaby; +Cc: kys, haiyangz, sthemmin, linux-hyperv, linux-kernel
In-Reply-To: <cover.1570130325.git.brandonbonaby94@gmail.com>
On Thu, Oct 03, 2019 at 05:01:36PM -0400, Branden Bonaby wrote:
>This patchset introduces a testing framework for Hyper-V drivers.
>This framework allows us to introduce delays in the packet receive
>path on a per-device basis. While the current code only supports
>introducing arbitrary delays in the host/guest communication path,
>we intend to expand this to support error injection in the future.
I've queued it up for hyperv-next, thanks!
--
Thanks,
Sasha
^ permalink raw reply
* [PATCH v4 3/4] xen: Mark "xen_nopvspin" parameter obsolete
From: Zhenzhong Duan @ 2019-10-03 14:02 UTC (permalink / raw)
To: linux-kernel
Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
peterz, Zhenzhong Duan, Jonathan Corbet, H. Peter Anvin
In-Reply-To: <1570111335-12731-1-git-send-email-zhenzhong.duan@oracle.com>
Map "xen_nopvspin" to "nopvspin", fix stale description of "xen_nopvspin"
as we use qspinlock now.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
Documentation/admin-guide/kernel-parameters.txt | 7 ++++---
arch/x86/xen/spinlock.c | 3 +++
2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 89d77ea..df1eacc 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5303,8 +5303,9 @@
never -- do not unplug even if version check succeeds
xen_nopvspin [X86,XEN]
- Disables the ticketlock slowpath using Xen PV
- optimizations.
+ Disables the qspinlock slowpath using Xen PV optimizations.
+ This parameter is obsoleted by "nopvspin" parameter, which
+ has equivalent effect for XEN platform.
xen_nopv [X86]
Disables the PV optimizations forcing the HVM guest to
@@ -5330,7 +5331,7 @@
as generic guest with no PV drivers. Currently support
XEN HVM, KVM, HYPER_V and VMWARE guest.
- nopvspin [X86,KVM]
+ nopvspin [X86,XEN,KVM]
Disables the qspinlock slow path using PV optimizations
which allow the hypervisor to 'idle' the guest on lock
contention.
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 6deb490..bae29a4 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -114,6 +114,8 @@ void xen_uninit_lock_cpu(int cpu)
*/
void __init xen_init_spinlocks(void)
{
+ if (nopvspin)
+ xen_pvspin = false;
/* Don't need to use pvqspinlock code if there is only 1 vCPU. */
if (num_possible_cpus() == 1)
@@ -137,6 +139,7 @@ void __init xen_init_spinlocks(void)
static __init int xen_parse_nopvspin(char *arg)
{
+ pr_notice("\"xen_nopvspin\" is deprecated, please use \"nopvspin\" instead\n");
xen_pvspin = false;
return 0;
}
--
1.8.3.1
^ permalink raw reply related
* [PATCH v4 2/4] x86/kvm: Change print code to use pr_*() format
From: Zhenzhong Duan @ 2019-10-03 14:02 UTC (permalink / raw)
To: linux-kernel
Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
peterz, Zhenzhong Duan, H. Peter Anvin
In-Reply-To: <1570111335-12731-1-git-send-email-zhenzhong.duan@oracle.com>
pr_*() is preferred than printk(KERN_* ...), after change all the print
in arch/x86/kernel/kvm.c will have "kvm_guest: xxx" style.
No functional change.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
arch/x86/kernel/kvm.c | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 481d879..a4bfe67 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -7,6 +7,8 @@
* Authors: Anthony Liguori <aliguori@us.ibm.com>
*/
+#define pr_fmt(fmt) "kvm_guest: " fmt
+
#include <linux/context_tracking.h>
#include <linux/init.h>
#include <linux/kernel.h>
@@ -286,8 +288,8 @@ static void kvm_register_steal_time(void)
return;
wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
- pr_info("kvm-stealtime: cpu %d, msr %llx\n",
- cpu, (unsigned long long) slow_virt_to_phys(st));
+ pr_info("stealtime: cpu %d, msr %llx\n", cpu,
+ (unsigned long long) slow_virt_to_phys(st));
}
static DEFINE_PER_CPU_DECRYPTED(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
@@ -321,8 +323,7 @@ static void kvm_guest_cpu_init(void)
wrmsrl(MSR_KVM_ASYNC_PF_EN, pa);
__this_cpu_write(apf_reason.enabled, 1);
- printk(KERN_INFO"KVM setup async PF for cpu %d\n",
- smp_processor_id());
+ pr_info("setup async PF for cpu %d\n", smp_processor_id());
}
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
@@ -347,8 +348,7 @@ static void kvm_pv_disable_apf(void)
wrmsrl(MSR_KVM_ASYNC_PF_EN, 0);
__this_cpu_write(apf_reason.enabled, 0);
- printk(KERN_INFO"Unregister pv shared memory for cpu %d\n",
- smp_processor_id());
+ pr_info("Unregister pv shared memory for cpu %d\n", smp_processor_id());
}
static void kvm_pv_guest_cpu_reboot(void *unused)
@@ -469,7 +469,8 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
} else {
ret = kvm_hypercall4(KVM_HC_SEND_IPI, (unsigned long)ipi_bitmap,
(unsigned long)(ipi_bitmap >> BITS_PER_LONG), min, icr);
- WARN_ONCE(ret < 0, "KVM: failed to send PV IPI: %ld", ret);
+ WARN_ONCE(ret < 0, "kvm_guest: failed to send PV IPI: %ld",
+ ret);
min = max = apic_id;
ipi_bitmap = 0;
}
@@ -479,7 +480,8 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
if (ipi_bitmap) {
ret = kvm_hypercall4(KVM_HC_SEND_IPI, (unsigned long)ipi_bitmap,
(unsigned long)(ipi_bitmap >> BITS_PER_LONG), min, icr);
- WARN_ONCE(ret < 0, "KVM: failed to send PV IPI: %ld", ret);
+ WARN_ONCE(ret < 0, "kvm_guest: failed to send PV IPI: %ld",
+ ret);
}
local_irq_restore(flags);
@@ -509,7 +511,7 @@ static void kvm_setup_pv_ipi(void)
{
apic->send_IPI_mask = kvm_send_ipi_mask;
apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
- pr_info("KVM setup pv IPIs\n");
+ pr_info("setup pv IPIs\n");
}
static void kvm_smp_send_call_func_ipi(const struct cpumask *mask)
@@ -639,11 +641,11 @@ static void __init kvm_guest_init(void)
!kvm_para_has_hint(KVM_HINTS_REALTIME) &&
kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
smp_ops.send_call_func_ipi = kvm_smp_send_call_func_ipi;
- pr_info("KVM setup pv sched yield\n");
+ pr_info("setup pv sched yield\n");
}
if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "x86/kvm:online",
kvm_cpu_online, kvm_cpu_down_prepare) < 0)
- pr_err("kvm_guest: Failed to install cpu hotplug callbacks\n");
+ pr_err("failed to install cpu hotplug callbacks\n");
#else
sev_map_percpu_data();
kvm_guest_cpu_init();
@@ -746,7 +748,7 @@ static __init int kvm_setup_pv_tlb_flush(void)
zalloc_cpumask_var_node(per_cpu_ptr(&__pv_tlb_mask, cpu),
GFP_KERNEL, cpu_to_node(cpu));
}
- pr_info("KVM setup pv remote TLB flush\n");
+ pr_info("setup pv remote TLB flush\n");
}
return 0;
@@ -879,8 +881,8 @@ static void kvm_enable_host_haltpoll(void *i)
void arch_haltpoll_enable(unsigned int cpu)
{
if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL)) {
- pr_err_once("kvm: host does not support poll control\n");
- pr_err_once("kvm: host upgrade recommended\n");
+ pr_err_once("host does not support poll control\n");
+ pr_err_once("host upgrade recommended\n");
return;
}
--
1.8.3.1
^ permalink raw reply related
* [PATCH v4 4/4] x86/hyperv: Mark "hv_nopvspin" parameter obsolete
From: Zhenzhong Duan @ 2019-10-03 14:02 UTC (permalink / raw)
To: linux-kernel
Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
peterz, Zhenzhong Duan, Jonathan Corbet, H. Peter Anvin
In-Reply-To: <1570111335-12731-1-git-send-email-zhenzhong.duan@oracle.com>
Map "hv_nopvspin" to "nopvspin".
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
Documentation/admin-guide/kernel-parameters.txt | 6 +++++-
arch/x86/hyperv/hv_spinlock.c | 4 ++++
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index df1eacc..08c6d34 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1436,6 +1436,10 @@
hv_nopvspin [X86,HYPER_V] Disables the paravirt spinlock optimizations
which allow the hypervisor to 'idle' the
guest on lock contention.
+ This parameter is obsoleted by "nopvspin"
+ parameter, which has equivalent effect for
+ HYPER_V platform.
+
keep_bootcon [KNL]
Do not unregister boot console at start. This is only
@@ -5331,7 +5335,7 @@
as generic guest with no PV drivers. Currently support
XEN HVM, KVM, HYPER_V and VMWARE guest.
- nopvspin [X86,XEN,KVM]
+ nopvspin [X86,XEN,KVM,HYPER_V]
Disables the qspinlock slow path using PV optimizations
which allow the hypervisor to 'idle' the guest on lock
contention.
diff --git a/arch/x86/hyperv/hv_spinlock.c b/arch/x86/hyperv/hv_spinlock.c
index 07f21a0..47c7d6c 100644
--- a/arch/x86/hyperv/hv_spinlock.c
+++ b/arch/x86/hyperv/hv_spinlock.c
@@ -64,6 +64,9 @@ __visible bool hv_vcpu_is_preempted(int vcpu)
void __init hv_init_spinlocks(void)
{
+ if (nopvspin)
+ hv_pvspin = false;
+
if (!hv_pvspin || !apic ||
!(ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) ||
!(ms_hyperv.features & HV_X64_MSR_GUEST_IDLE_AVAILABLE)) {
@@ -82,6 +85,7 @@ void __init hv_init_spinlocks(void)
static __init int hv_parse_nopvspin(char *arg)
{
+ pr_notice("\"hv_nopvspin\" is deprecated, please use \"nopvspin\" instead\n");
hv_pvspin = false;
return 0;
}
--
1.8.3.1
^ permalink raw reply related
* [PATCH v4 1/4] x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
From: Zhenzhong Duan @ 2019-10-03 14:02 UTC (permalink / raw)
To: linux-kernel
Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
peterz, Zhenzhong Duan, Jonathan Corbet, H. Peter Anvin,
Will Deacon
In-Reply-To: <1570111335-12731-1-git-send-email-zhenzhong.duan@oracle.com>
There are cases where a guest tries to switch spinlocks to bare metal
behavior (e.g. by setting "xen_nopvspin" on XEN platform and
"hv_nopvspin" on HYPER_V).
That feature is missed on KVM, add a new parameter "nopvspin" to disable
PV spinlocks for KVM guest.
The new 'nopvspin' parameter will also replace Xen and Hyper-V specific
parameters in future patches.
Define variable nopvsin as global because it will be used in future patches
as above.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
---
Documentation/admin-guide/kernel-parameters.txt | 5 +++++
arch/x86/include/asm/qspinlock.h | 1 +
arch/x86/kernel/kvm.c | 21 ++++++++++++++-------
kernel/locking/qspinlock.c | 7 +++++++
4 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c7ac2f3..89d77ea 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5330,6 +5330,11 @@
as generic guest with no PV drivers. Currently support
XEN HVM, KVM, HYPER_V and VMWARE guest.
+ nopvspin [X86,KVM]
+ Disables the qspinlock slow path using PV optimizations
+ which allow the hypervisor to 'idle' the guest on lock
+ contention.
+
xirc2ps_cs= [NET,PCMCIA]
Format:
<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 444d6fd..d86ab94 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -32,6 +32,7 @@ static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lo
extern void __pv_init_lock_hash(void);
extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
extern void __raw_callee_save___pv_queued_spin_unlock(struct qspinlock *lock);
+extern bool nopvspin;
#define queued_spin_unlock queued_spin_unlock
/**
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e820568..481d879 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -831,16 +831,23 @@ __visible bool __kvm_vcpu_is_preempted(long cpu)
*/
void __init kvm_spinlock_init(void)
{
- /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
- if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
- return;
-
- if (kvm_para_has_hint(KVM_HINTS_REALTIME))
+ /*
+ * Don't use the pvqspinlock code if no KVM_FEATURE_PV_UNHALT feature
+ * support, or there is REALTIME hints or only 1 vCPU.
+ */
+ if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT) ||
+ kvm_para_has_hint(KVM_HINTS_REALTIME) ||
+ num_possible_cpus() == 1) {
+ pr_info("PV spinlocks disabled\n");
return;
+ }
- /* Don't use the pvqspinlock code if there is only 1 vCPU. */
- if (num_possible_cpus() == 1)
+ if (nopvspin) {
+ pr_info("PV spinlocks disabled forced by \"nopvspin\" parameter.\n");
+ static_branch_disable(&virt_spin_lock_key);
return;
+ }
+ pr_info("PV spinlocks enabled\n");
__pv_init_lock_hash();
pv_ops.lock.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 2473f10..75193d6 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -580,4 +580,11 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
#include "qspinlock_paravirt.h"
#include "qspinlock.c"
+bool nopvspin __initdata;
+static __init int parse_nopvspin(char *arg)
+{
+ nopvspin = true;
+ return 0;
+}
+early_param("nopvspin", parse_nopvspin);
#endif
--
1.8.3.1
^ permalink raw reply related
* [PATCH v4 0/4] Add a unified parameter "nopvspin"
From: Zhenzhong Duan @ 2019-10-03 14:02 UTC (permalink / raw)
To: linux-kernel
Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
peterz, Zhenzhong Duan
There are cases folks want to disable spinlock optimization for
debug/test purpose. Xen and hyperv already have parameters "xen_nopvspin"
and "hv_nopvspin" to support that, but kvm doesn't.
The first patch adds that feature to KVM guest with "nopvspin".
For compatibility reason original parameters "xen_nopvspin" and
"hv_nopvspin" are retained and marked obsolete.
v4:
PATCH1: use variable name nopvspin instead of pvspin and
defined it as __initdata, changed print message,
updated patch description [Sean Christopherson]
PATCH2: remove Suggested-by, use "kvm-guest:" prefix [Sean Christopherson]
PATCH3: make variable nopvsin and xen_pvspin coexist
remove Reviewed-by due to code change [Sean Christopherson]
PATCH4: make variable nopvsin and hv_pvspin coexist [Sean Christopherson]
v3:
PATCH2: Fix indentation
v2:
PATCH1: pick the print code change into separate PATCH2,
updated patch description [Vitaly Kuznetsov]
PATCH2: new patch with print code change [Vitaly Kuznetsov]
PATCH3: add Reviewed-by [Juergen Gross]
Zhenzhong Duan (4):
x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
x86/kvm: Change print code to use pr_*() format
xen: Mark "xen_nopvspin" parameter obsolete
x86/hyperv: Mark "hv_nopvspin" parameter obsolete
Documentation/admin-guide/kernel-parameters.txt | 14 ++++++-
arch/x86/hyperv/hv_spinlock.c | 4 ++
arch/x86/include/asm/qspinlock.h | 1 +
arch/x86/kernel/kvm.c | 51 +++++++++++++++----------
arch/x86/xen/spinlock.c | 3 ++
kernel/locking/qspinlock.c | 7 ++++
6 files changed, 57 insertions(+), 23 deletions(-)
--
1.8.3.1
^ permalink raw reply
* Re: [PATCH v2] x86/hyperv: make vapic support x2apic mode
From: Roman Kagan @ 2019-10-04 9:18 UTC (permalink / raw)
To: Michael Kelley
Cc: vkuznets, kvm@vger.kernel.org, Tianyu Lan, Joerg Roedel,
KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
x86@kernel.org, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <CY4PR21MB0136269170E69EA8F02A89E9D79E0@CY4PR21MB0136.namprd21.prod.outlook.com>
On Fri, Oct 04, 2019 at 03:01:51AM +0000, Michael Kelley wrote:
> From: Roman Kagan <rkagan@virtuozzo.com> Sent: Thursday, October 3, 2019 5:53 AM
> > >
> > > AFAIU you're trying to mirror native_x2apic_icr_write() here but this is
> > > different from what hv_apic_icr_write() does
> > > (SET_APIC_DEST_FIELD(id)).
> >
> > Right. In xapic mode the ICR2 aka the high 4 bytes of ICR is programmed
> > with the destination id in the highest byte; in x2apic mode the whole
> > ICR2 is set to the 32bit destination id.
> >
> > > Is it actually correct? (I think you've tested this and it is but)
> >
> > As I wrote in the commit log, I haven't tested it in the sense that I
> > ran a Linux guest in a Hyper-V VM exposing x2apic to the guest, because
> > I didn't manage to configure it to do so. OTOH I did run a Windows
> > guest in QEMU/KVM with hv_apic and x2apic enabled and saw it write
> > destination ids unshifted to the ICR2 part of ICR, so I assume it's
> > correct.
> >
> > > Michael, could you please shed some light here?
> >
> > Would be appreciated, indeed.
> >
>
> The newest version of Hyper-V provides an x2apic in a guest VM when the
> number of vCPUs in the VM is > 240. This version of Hyper-V is beginning
> to be deployed in Azure to enable the M416v2 VM size, but the functionality
> is not yet available for the on-premises version of Hyper-V. However, I can
> test this configuration internally with the above patch -- give me a few days.
>
> An additional complication is that when running on Intel processors that offer
> vAPIC functionality, the Hyper-V "hints" value does *not* recommend using the
> MSR-based APIC accesses. In this case, memory-mapped access to the x2apic
> registers is faster than the synthetic MSRs.
I guess you mean "using regular x2apic MSRs compared to the synthetic
MSRs". Indeed they do essentially the same thing, and there's no reason
for one set of MSRs to be significantly faster than the other. However,
hv_apic_eoi_write makes use of "apic assists" aka lazy EOI which is
certainly a win, and I'm not sure if it works without hv_apic.
> I've already looked at a VM that has
> the x2apic, and indeed that is the case, so the above code wouldn't run
> anyway. But I can temporarily code around that for testing purposes and see
> if everything works.
Thanks!
Roman.
^ permalink raw reply
* Re: [RFC PATCH 00/13] vsock: add multi-transports support
From: Stefano Garzarella @ 2019-10-04 9:16 UTC (permalink / raw)
To: Dexuan Cui
Cc: netdev@vger.kernel.org, linux-hyperv@vger.kernel.org,
KY Srinivasan, Stefan Hajnoczi, Sasha Levin,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
David S. Miller, virtualization@lists.linux-foundation.org,
Stephen Hemminger, Jason Wang, Michael S. Tsirkin, Haiyang Zhang,
Jorgen Hansen
In-Reply-To: <PU1P153MB0169970A7DD4383F06CDAB60BF9E0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>
On Fri, Oct 04, 2019 at 12:04:46AM +0000, Dexuan Cui wrote:
> > From: Stefano Garzarella <sgarzare@redhat.com>
> > Sent: Friday, September 27, 2019 4:27 AM
> > ...
> > Patch 9 changes the hvs_remote_addr_init(). setting the
> > VMADDR_CID_HOST as remote CID instead of VMADDR_CID_ANY to make
> > the choice of transport to be used work properly.
> > @Dexuan Could this change break anything?
>
> This patch looks good to me.
>
Thank you very much for your reviews!
> > @Dexuan please can you test on HyperV that I didn't break anything
> > even without nested VMs?
>
> I did some quick tests with the 13 patches in a Linux VM (this is not
> a nested VM) on Hyper-V and it looks nothing is broken. :-)
>
Great :-)
> > I'll try to setup a Windows host where to test the nested VMs
>
> I suppose you're going to run a Linux VM on a Hyper-V host,
> and the Linux VM itself runs KVM/VmWare so it can create its own child
> VMs. IMO this is similar to the test "nested KVM ( ..., virtio-transport[L1,L2]"
> you have done.
Yes, I think so. If the Hyper-V transport works well without nested VM,
it should work the same with a nested KVM/VMware.
Thanks,
Stefano
^ permalink raw reply
* RE: [PATCH v2] x86/hyperv: make vapic support x2apic mode
From: Michael Kelley @ 2019-10-04 3:01 UTC (permalink / raw)
To: Roman Kagan, vkuznets
Cc: kvm@vger.kernel.org, Tianyu Lan, Joerg Roedel, KY Srinivasan,
Haiyang Zhang, Stephen Hemminger, Sasha Levin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, H. Peter Anvin, x86@kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20191003125236.GA2424@rkaganb.sw.ru>
From: Roman Kagan <rkagan@virtuozzo.com> Sent: Thursday, October 3, 2019 5:53 AM
> >
> > AFAIU you're trying to mirror native_x2apic_icr_write() here but this is
> > different from what hv_apic_icr_write() does
> > (SET_APIC_DEST_FIELD(id)).
>
> Right. In xapic mode the ICR2 aka the high 4 bytes of ICR is programmed
> with the destination id in the highest byte; in x2apic mode the whole
> ICR2 is set to the 32bit destination id.
>
> > Is it actually correct? (I think you've tested this and it is but)
>
> As I wrote in the commit log, I haven't tested it in the sense that I
> ran a Linux guest in a Hyper-V VM exposing x2apic to the guest, because
> I didn't manage to configure it to do so. OTOH I did run a Windows
> guest in QEMU/KVM with hv_apic and x2apic enabled and saw it write
> destination ids unshifted to the ICR2 part of ICR, so I assume it's
> correct.
>
> > Michael, could you please shed some light here?
>
> Would be appreciated, indeed.
>
The newest version of Hyper-V provides an x2apic in a guest VM when the
number of vCPUs in the VM is > 240. This version of Hyper-V is beginning
to be deployed in Azure to enable the M416v2 VM size, but the functionality
is not yet available for the on-premises version of Hyper-V. However, I can
test this configuration internally with the above patch -- give me a few days.
An additional complication is that when running on Intel processors that offer
vAPIC functionality, the Hyper-V "hints" value does *not* recommend using the
MSR-based APIC accesses. In this case, memory-mapped access to the x2apic
registers is faster than the synthetic MSRs. I've already looked at a VM that has
the x2apic, and indeed that is the case, so the above code wouldn't run
anyway. But I can temporarily code around that for testing purposes and see
if everything works.
Michael
^ permalink raw reply
* RE: [RFC PATCH 00/13] vsock: add multi-transports support
From: Dexuan Cui @ 2019-10-04 0:04 UTC (permalink / raw)
To: Stefano Garzarella, netdev@vger.kernel.org
Cc: linux-hyperv@vger.kernel.org, KY Srinivasan, Stefan Hajnoczi,
Sasha Levin, linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
David S. Miller, virtualization@lists.linux-foundation.org,
Stephen Hemminger, Jason Wang, Michael S. Tsirkin, Haiyang Zhang,
Jorgen Hansen
In-Reply-To: <20190927112703.17745-1-sgarzare@redhat.com>
> From: Stefano Garzarella <sgarzare@redhat.com>
> Sent: Friday, September 27, 2019 4:27 AM
> ...
> Patch 9 changes the hvs_remote_addr_init(). setting the
> VMADDR_CID_HOST as remote CID instead of VMADDR_CID_ANY to make
> the choice of transport to be used work properly.
> @Dexuan Could this change break anything?
This patch looks good to me.
> @Dexuan please can you test on HyperV that I didn't break anything
> even without nested VMs?
I did some quick tests with the 13 patches in a Linux VM (this is not
a nested VM) on Hyper-V and it looks nothing is broken. :-)
> I'll try to setup a Windows host where to test the nested VMs
I suppose you're going to run a Linux VM on a Hyper-V host,
and the Linux VM itself runs KVM/VmWare so it can create its own child
VMs. IMO this is similar to the test "nested KVM ( ..., virtio-transport[L1,L2]"
you have done.
.
Thanks!
Dexuan
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox