* RE: [PATCH v3 0/7] PCI: PM: Move to D0 before calling pci_legacy_resume_early()
From: Dexuan Cui @ 2019-10-15 18:24 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
marcelo.cerri@canonical.com, jackm@mellanox.com,
linux-pci@vger.kernel.org, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org,
driverdev-devel@linuxdriverproject.org, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-1-helgaas@kernel.org>
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Monday, October 14, 2019 4:00 PM
> ...
>
> Dexuan, the important thing here is the first patch, which is your [1],
> which I modified by doing pci_restore_state() as well as setting to D0:
>
> pci_set_power_state(pci_dev, PCI_D0);
> pci_restore_state(pci_dev);
>
> I'm proposing some more patches on top. None are relevant to the problem
> you're solving; they're just minor doc and other updates in the same area.
>
> Rafael, if you have a chance to look at these, I'd appreciate it. I tried
> to make the doc match the code, but I'm no PM expert.
Thank you very much, Bjorn! The patchset looks good to me.
Thanks,
-- Dexuan
^ permalink raw reply
* Re: [PATCH 7/7] PCI/MSI: Move power state check out of pci_msi_supported()
From: Rafael J. Wysocki @ 2019-10-15 17:24 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Dexuan Cui, Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-8-helgaas@kernel.org>
On Tuesday, October 15, 2019 1:00:16 AM CEST Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
>
> 27e20603c54b ("PCI/MSI: Move D0 check into pci_msi_check_device()")
> moved the power state check into pci_msi_check_device(), which was
> subsequently renamed to pci_msi_supported(). This didn't change the
> behavior, since both callers checked the power state.
>
> However, it doesn't fit the current "pci_msi_supported()" name, which
> should return what the device is capable of, independent of the power
> state.
>
> Move the power state check back into the callers for readability. No
> functional change intended.
>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
No issues found, so
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> drivers/pci/msi.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 0884bedcfc7a..20e9c729617c 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -861,7 +861,7 @@ static int pci_msi_supported(struct pci_dev *dev, int nvec)
> if (!pci_msi_enable)
> return 0;
>
> - if (!dev || dev->no_msi || dev->current_state != PCI_D0)
> + if (!dev || dev->no_msi)
> return 0;
>
> /*
> @@ -972,7 +972,7 @@ static int __pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries,
> int nr_entries;
> int i, j;
>
> - if (!pci_msi_supported(dev, nvec))
> + if (!pci_msi_supported(dev, nvec) || dev->current_state != PCI_D0)
> return -EINVAL;
>
> nr_entries = pci_msix_vec_count(dev);
> @@ -1058,7 +1058,7 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
> int nvec;
> int rc;
>
> - if (!pci_msi_supported(dev, minvec))
> + if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0)
> return -EINVAL;
>
> /* Check whether driver already requested MSI-X IRQs */
>
^ permalink raw reply
* Re: [PATCH 6/7] PCI/PM: Wrap long lines in documentation
From: Rafael J. Wysocki @ 2019-10-15 17:23 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Dexuan Cui, Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-7-helgaas@kernel.org>
On Tuesday, October 15, 2019 1:00:15 AM CEST Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
>
> Documentation/power/pci.rst is wrapped to fit in 80 columns, but directory
> structure changes made a few lines longer. Wrap them so they all fit in 80
> columns again.
>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Well, looks better this way. :-)
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> Documentation/power/pci.rst | 28 ++++++++++++++--------------
> 1 file changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst
> index 1525c594d631..db41a770a2f5 100644
> --- a/Documentation/power/pci.rst
> +++ b/Documentation/power/pci.rst
> @@ -426,12 +426,12 @@ pm->runtime_idle() callback.
> 2.4. System-Wide Power Transitions
> ----------------------------------
> There are a few different types of system-wide power transitions, described in
> -Documentation/driver-api/pm/devices.rst. Each of them requires devices to be handled
> -in a specific way and the PM core executes subsystem-level power management
> -callbacks for this purpose. They are executed in phases such that each phase
> -involves executing the same subsystem-level callback for every device belonging
> -to the given subsystem before the next phase begins. These phases always run
> -after tasks have been frozen.
> +Documentation/driver-api/pm/devices.rst. Each of them requires devices to be
> +handled in a specific way and the PM core executes subsystem-level power
> +management callbacks for this purpose. They are executed in phases such that
> +each phase involves executing the same subsystem-level callback for every device
> +belonging to the given subsystem before the next phase begins. These phases
> +always run after tasks have been frozen.
>
> 2.4.1. System Suspend
> ^^^^^^^^^^^^^^^^^^^^^
> @@ -636,12 +636,12 @@ System restore requires a hibernation image to be loaded into memory and the
> pre-hibernation memory contents to be restored before the pre-hibernation system
> activity can be resumed.
>
> -As described in Documentation/driver-api/pm/devices.rst, the hibernation image is loaded
> -into memory by a fresh instance of the kernel, called the boot kernel, which in
> -turn is loaded and run by a boot loader in the usual way. After the boot kernel
> -has loaded the image, it needs to replace its own code and data with the code
> -and data of the "hibernated" kernel stored within the image, called the image
> -kernel. For this purpose all devices are frozen just like before creating
> +As described in Documentation/driver-api/pm/devices.rst, the hibernation image
> +is loaded into memory by a fresh instance of the kernel, called the boot kernel,
> +which in turn is loaded and run by a boot loader in the usual way. After the
> +boot kernel has loaded the image, it needs to replace its own code and data with
> +the code and data of the "hibernated" kernel stored within the image, called the
> +image kernel. For this purpose all devices are frozen just like before creating
> the image during hibernation, in the
>
> prepare, freeze, freeze_noirq
> @@ -691,8 +691,8 @@ controlling the runtime power management of their devices.
>
> At the time of this writing there are two ways to define power management
> callbacks for a PCI device driver, the recommended one, based on using a
> -dev_pm_ops structure described in Documentation/driver-api/pm/devices.rst, and the
> -"legacy" one, in which the .suspend(), .suspend_late(), .resume_early(), and
> +dev_pm_ops structure described in Documentation/driver-api/pm/devices.rst, and
> +the "legacy" one, in which the .suspend(), .suspend_late(), .resume_early(), and
> .resume() callbacks from struct pci_driver are used. The legacy approach,
> however, doesn't allow one to define runtime power management callbacks and is
> not really suitable for any new drivers. Therefore it is not covered by this
>
^ permalink raw reply
* Re: [PATCH 5/7] PCI/PM: Make power management op coding style consistent
From: Rafael J. Wysocki @ 2019-10-15 17:22 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Dexuan Cui, Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-6-helgaas@kernel.org>
On Tuesday, October 15, 2019 1:00:14 AM CEST Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
>
> Some of the power management ops use this style:
>
> struct device_driver *drv = dev->driver;
> if (drv && drv->pm && drv->pm->prepare(dev))
> drv->pm->prepare(dev);
>
> while others use this:
>
> const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> if (pm && pm->runtime_resume)
> pm->runtime_resume(dev);
>
> Convert the first style to the second so they're all consistent. Remove
> local "error" variables when unnecessary. No functional change intended.
>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
No issues found, so
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> drivers/pci/pci-driver.c | 76 +++++++++++++++++++---------------------
> 1 file changed, 36 insertions(+), 40 deletions(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 55acb658273f..abbf5c39cb9c 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -679,11 +679,11 @@ static bool pci_has_legacy_pm_support(struct pci_dev *pci_dev)
>
> static int pci_pm_prepare(struct device *dev)
> {
> - struct device_driver *drv = dev->driver;
> struct pci_dev *pci_dev = to_pci_dev(dev);
> + const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>
> - if (drv && drv->pm && drv->pm->prepare) {
> - int error = drv->pm->prepare(dev);
> + if (pm && pm->prepare) {
> + int error = pm->prepare(dev);
> if (error < 0)
> return error;
>
> @@ -917,8 +917,7 @@ static int pci_pm_suspend_noirq(struct device *dev)
> static int pci_pm_resume_noirq(struct device *dev)
> {
> struct pci_dev *pci_dev = to_pci_dev(dev);
> - struct device_driver *drv = dev->driver;
> - int error = 0;
> + const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>
> if (dev_pm_may_skip_resume(dev))
> return 0;
> @@ -946,17 +945,16 @@ static int pci_pm_resume_noirq(struct device *dev)
> if (pci_has_legacy_pm_support(pci_dev))
> return pci_legacy_resume_early(dev);
>
> - if (drv && drv->pm && drv->pm->resume_noirq)
> - error = drv->pm->resume_noirq(dev);
> + if (pm && pm->resume_noirq)
> + return pm->resume_noirq(dev);
>
> - return error;
> + return 0;
> }
>
> static int pci_pm_resume(struct device *dev)
> {
> struct pci_dev *pci_dev = to_pci_dev(dev);
> const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> - int error = 0;
>
> /*
> * This is necessary for the suspend error path in which resume is
> @@ -972,12 +970,12 @@ static int pci_pm_resume(struct device *dev)
>
> if (pm) {
> if (pm->resume)
> - error = pm->resume(dev);
> + return pm->resume(dev);
> } else {
> pci_pm_reenable_device(pci_dev);
> }
>
> - return error;
> + return 0;
> }
>
> #else /* !CONFIG_SUSPEND */
> @@ -1038,16 +1036,16 @@ static int pci_pm_freeze(struct device *dev)
> static int pci_pm_freeze_noirq(struct device *dev)
> {
> struct pci_dev *pci_dev = to_pci_dev(dev);
> - struct device_driver *drv = dev->driver;
> + const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>
> if (pci_has_legacy_pm_support(pci_dev))
> return pci_legacy_suspend_late(dev, PMSG_FREEZE);
>
> - if (drv && drv->pm && drv->pm->freeze_noirq) {
> + if (pm && pm->freeze_noirq) {
> int error;
>
> - error = drv->pm->freeze_noirq(dev);
> - suspend_report_result(drv->pm->freeze_noirq, error);
> + error = pm->freeze_noirq(dev);
> + suspend_report_result(pm->freeze_noirq, error);
> if (error)
> return error;
> }
> @@ -1066,8 +1064,8 @@ static int pci_pm_freeze_noirq(struct device *dev)
> static int pci_pm_thaw_noirq(struct device *dev)
> {
> struct pci_dev *pci_dev = to_pci_dev(dev);
> - struct device_driver *drv = dev->driver;
> - int error = 0;
> + const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> + int error;
>
> if (pcibios_pm_ops.thaw_noirq) {
> error = pcibios_pm_ops.thaw_noirq(dev);
> @@ -1091,10 +1089,10 @@ static int pci_pm_thaw_noirq(struct device *dev)
> if (pci_has_legacy_pm_support(pci_dev))
> return pci_legacy_resume_early(dev);
>
> - if (drv && drv->pm && drv->pm->thaw_noirq)
> - error = drv->pm->thaw_noirq(dev);
> + if (pm && pm->thaw_noirq)
> + return pm->thaw_noirq(dev);
>
> - return error;
> + return 0;
> }
>
> static int pci_pm_thaw(struct device *dev)
> @@ -1165,24 +1163,24 @@ static int pci_pm_poweroff_late(struct device *dev)
> static int pci_pm_poweroff_noirq(struct device *dev)
> {
> struct pci_dev *pci_dev = to_pci_dev(dev);
> - struct device_driver *drv = dev->driver;
> + const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>
> if (dev_pm_smart_suspend_and_suspended(dev))
> return 0;
>
> - if (pci_has_legacy_pm_support(to_pci_dev(dev)))
> + if (pci_has_legacy_pm_support(pci_dev))
> return pci_legacy_suspend_late(dev, PMSG_HIBERNATE);
>
> - if (!drv || !drv->pm) {
> + if (!pm) {
> pci_fixup_device(pci_fixup_suspend_late, pci_dev);
> return 0;
> }
>
> - if (drv->pm->poweroff_noirq) {
> + if (pm->poweroff_noirq) {
> int error;
>
> - error = drv->pm->poweroff_noirq(dev);
> - suspend_report_result(drv->pm->poweroff_noirq, error);
> + error = pm->poweroff_noirq(dev);
> + suspend_report_result(pm->poweroff_noirq, error);
> if (error)
> return error;
> }
> @@ -1208,8 +1206,8 @@ static int pci_pm_poweroff_noirq(struct device *dev)
> static int pci_pm_restore_noirq(struct device *dev)
> {
> struct pci_dev *pci_dev = to_pci_dev(dev);
> - struct device_driver *drv = dev->driver;
> - int error = 0;
> + const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> + int error;
>
> if (pcibios_pm_ops.restore_noirq) {
> error = pcibios_pm_ops.restore_noirq(dev);
> @@ -1223,17 +1221,16 @@ static int pci_pm_restore_noirq(struct device *dev)
> if (pci_has_legacy_pm_support(pci_dev))
> return pci_legacy_resume_early(dev);
>
> - if (drv && drv->pm && drv->pm->restore_noirq)
> - error = drv->pm->restore_noirq(dev);
> + if (pm && pm->restore_noirq)
> + return pm->restore_noirq(dev);
>
> - return error;
> + return 0;
> }
>
> static int pci_pm_restore(struct device *dev)
> {
> struct pci_dev *pci_dev = to_pci_dev(dev);
> const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> - int error = 0;
>
> /*
> * This is necessary for the hibernation error path in which restore is
> @@ -1249,12 +1246,12 @@ static int pci_pm_restore(struct device *dev)
>
> if (pm) {
> if (pm->restore)
> - error = pm->restore(dev);
> + return pm->restore(dev);
> } else {
> pci_pm_reenable_device(pci_dev);
> }
>
> - return error;
> + return 0;
> }
>
> #else /* !CONFIG_HIBERNATE_CALLBACKS */
> @@ -1330,9 +1327,9 @@ static int pci_pm_runtime_suspend(struct device *dev)
>
> static int pci_pm_runtime_resume(struct device *dev)
> {
> - int rc = 0;
> struct pci_dev *pci_dev = to_pci_dev(dev);
> const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> + int error = 0;
>
> /*
> * Restoring config space is necessary even if the device is not bound
> @@ -1348,18 +1345,17 @@ static int pci_pm_runtime_resume(struct device *dev)
> pci_pm_default_resume(pci_dev);
>
> if (pm && pm->runtime_resume)
> - rc = pm->runtime_resume(dev);
> + error = pm->runtime_resume(dev);
>
> pci_dev->runtime_d3cold = false;
>
> - return rc;
> + return error;
> }
>
> static int pci_pm_runtime_idle(struct device *dev)
> {
> struct pci_dev *pci_dev = to_pci_dev(dev);
> const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> - int ret = 0;
>
> /*
> * If pci_dev->driver is not set (unbound), the device should
> @@ -1372,9 +1368,9 @@ static int pci_pm_runtime_idle(struct device *dev)
> return -ENOSYS;
>
> if (pm->runtime_idle)
> - ret = pm->runtime_idle(dev);
> + return pm->runtime_idle(dev);
>
> - return ret;
> + return 0;
> }
>
> static const struct dev_pm_ops pci_dev_pm_ops = {
>
^ permalink raw reply
* Re: [PATCH 4/7] PCI/PM: Run resume fixups before disabling wakeup events
From: Rafael J. Wysocki @ 2019-10-15 17:21 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Dexuan Cui, Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-5-helgaas@kernel.org>
On Tuesday, October 15, 2019 1:00:13 AM CEST Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
>
> pci_pm_resume() and pci_pm_restore() call pci_pm_default_resume(), which
> runs resume fixups before disabling wakeup events:
>
> static void pci_pm_default_resume(struct pci_dev *pci_dev)
> {
> pci_fixup_device(pci_fixup_resume, pci_dev);
> pci_enable_wake(pci_dev, PCI_D0, false);
> }
>
> pci_pm_runtime_resume() does both of these, but in the opposite order:
>
> pci_enable_wake(pci_dev, PCI_D0, false);
> pci_fixup_device(pci_fixup_resume, pci_dev);
>
> We should always use the same ordering unless there's a reason to do
> otherwise.
Right.
> Change pci_pm_runtime_resume() to call pci_pm_default_resume()
> instead of open-coding this, so the fixups are always done before disabling
> wakeup events.
>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
No concerns about this change, so
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> drivers/pci/pci-driver.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 0c3086793e4e..55acb658273f 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1345,8 +1345,7 @@ static int pci_pm_runtime_resume(struct device *dev)
> return 0;
>
> pci_fixup_device(pci_fixup_resume_early, pci_dev);
> - pci_enable_wake(pci_dev, PCI_D0, false);
> - pci_fixup_device(pci_fixup_resume, pci_dev);
> + pci_pm_default_resume(pci_dev);
>
> if (pm && pm->runtime_resume)
> rc = pm->runtime_resume(dev);
>
^ permalink raw reply
* Re: [PATCH 3/7] PCI/PM: Clear PCIe PME Status even for legacy power management
From: Rafael J. Wysocki @ 2019-10-15 17:20 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Dexuan Cui, Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-4-helgaas@kernel.org>
On Tuesday, October 15, 2019 1:00:12 AM CEST Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
>
> Previously, pci_pm_resume_noirq() cleared the PME Status bit in the Root
> Status register only if the device had no driver or the driver did not
> implement legacy power management. It should clear PME Status regardless
> of what sort of power management the driver supports, so do this before
> checking for legacy power management.
>
> This affects Root Ports and Root Complex Event Collectors, for which the
> usual driver is the PCIe portdrv, which implements new power management, so
> this change is just on principle, not to fix any actual defects.
>
> Fixes: a39bd851dccf ("PCI/PM: Clear PCIe PME Status bit in core, not PCIe port driver")
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This is a reasonable change in my view, so
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> drivers/pci/pci-driver.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index d4ac8ce8c1f9..0c3086793e4e 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -941,12 +941,11 @@ static int pci_pm_resume_noirq(struct device *dev)
> pci_pm_default_resume_early(pci_dev);
>
> pci_fixup_device(pci_fixup_resume_early, pci_dev);
> + pcie_pme_root_status_cleanup(pci_dev);
>
> if (pci_has_legacy_pm_support(pci_dev))
> return pci_legacy_resume_early(dev);
>
> - pcie_pme_root_status_cleanup(pci_dev);
> -
> if (drv && drv->pm && drv->pm->resume_noirq)
> error = drv->pm->resume_noirq(dev);
>
>
^ permalink raw reply
* Re: [PATCH 2/7] PCI/PM: Correct pci_pm_thaw_noirq() documentation
From: Rafael J. Wysocki @ 2019-10-15 17:17 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Dexuan Cui, Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-3-helgaas@kernel.org>
On Tuesday, October 15, 2019 1:00:11 AM CEST Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
>
> According to the documentation, pci_pm_thaw_noirq() did not put the device
> into the full-power state and restore its standard configuration registers.
> This is incorrect, so update the documentation to match the code.
>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Right, the documentation is outdated, so
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> Documentation/power/pci.rst | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst
> index 0e2ef7429304..1525c594d631 100644
> --- a/Documentation/power/pci.rst
> +++ b/Documentation/power/pci.rst
> @@ -600,17 +600,17 @@ using the following PCI bus type's callbacks::
>
> respectively.
>
> -The first of them, pci_pm_thaw_noirq(), is analogous to pci_pm_resume_noirq(),
> -but it doesn't put the device into the full power state and doesn't attempt to
> -restore its standard configuration registers. It also executes the device
> -driver's pm->thaw_noirq() callback, if defined, instead of pm->resume_noirq().
> +The first of them, pci_pm_thaw_noirq(), is analogous to pci_pm_resume_noirq().
> +It puts the device into the full power state and restores its standard
> +configuration registers. It also executes the device driver's pm->thaw_noirq()
> +callback, if defined, instead of pm->resume_noirq().
>
> The pci_pm_thaw() routine is similar to pci_pm_resume(), but it runs the device
> driver's pm->thaw() callback instead of pm->resume(). It is executed
> asynchronously for different PCI devices that don't depend on each other in a
> known way.
>
> -The complete phase it the same as for system resume.
> +The complete phase is the same as for system resume.
>
> After saving the image, devices need to be powered down before the system can
> enter the target sleep state (ACPI S4 for ACPI-based systems). This is done in
>
^ permalink raw reply
* Re: [PATCH 1/7] PCI/PM: Always return devices to D0 when thawing
From: Rafael J. Wysocki @ 2019-10-15 17:15 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Dexuan Cui, Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas,
Linux PM
In-Reply-To: <20191014230016.240912-2-helgaas@kernel.org>
On Tuesday, October 15, 2019 1:00:10 AM CEST Bjorn Helgaas wrote:
> From: Dexuan Cui <decui@microsoft.com>
>
> pci_pm_thaw_noirq() is supposed to return the device to D0 and restore its
> configuration registers, but previously it only did that for devices whose
> drivers implemented the new power management ops.
>
> Hibernation, e.g., via "echo disk > /sys/power/state", involves freezing
> devices, creating a hibernation image, thawing devices, writing the image,
> and powering off. The fact that thawing did not return devices with legacy
> power management to D0 caused errors, e.g., in this path:
>
> pci_pm_thaw_noirq
> if (pci_has_legacy_pm_support(pci_dev)) # true for Mellanox VF driver
> return pci_legacy_resume_early(dev) # ... legacy PM skips the rest
> pci_set_power_state(pci_dev, PCI_D0)
> pci_restore_state(pci_dev)
> pci_pm_thaw
> if (pci_has_legacy_pm_support(pci_dev))
> pci_legacy_resume
> drv->resume
> mlx4_resume
> ...
> pci_enable_msix_range
> ...
> if (dev->current_state != PCI_D0) # <---
> return -EINVAL;
>
> which caused these warnings:
>
> mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting
> PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
> PM: Device a6d1:00:02.0 failed to thaw: error -95
>
> Return devices to D0 and restore config registers for all devices, not just
> those whose drivers support new power management.
>
> [bhelgaas: also call pci_restore_state() before pci_legacy_resume_early(),
> update comment, add stable tag, commit log]
> Link: https://lore.kernel.org/r/KU1P153MB016637CAEAD346F0AA8E3801BFAD0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COM
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> Cc: stable@vger.kernel.org # v4.13+
No issues found, so
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> drivers/pci/pci-driver.c | 17 +++++++++++------
> 1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index a8124e47bf6e..d4ac8ce8c1f9 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1076,17 +1076,22 @@ static int pci_pm_thaw_noirq(struct device *dev)
> return error;
> }
>
> - if (pci_has_legacy_pm_support(pci_dev))
> - return pci_legacy_resume_early(dev);
> -
> /*
> - * pci_restore_state() requires the device to be in D0 (because of MSI
> - * restoration among other things), so force it into D0 in case the
> - * driver's "freeze" callbacks put it into a low-power state directly.
> + * Both the legacy ->resume_early() and the new pm->thaw_noirq()
> + * callbacks assume the device has been returned to D0 and its
> + * config state has been restored.
> + *
> + * In addition, pci_restore_state() restores MSI-X state in MMIO
> + * space, which requires the device to be in D0, so return it to D0
> + * in case the driver's "freeze" callbacks put it into a low-power
> + * state.
> */
> pci_set_power_state(pci_dev, PCI_D0);
> pci_restore_state(pci_dev);
>
> + if (pci_has_legacy_pm_support(pci_dev))
> + return pci_legacy_resume_early(dev);
> +
> if (drv && drv->pm && drv->pm->thaw_noirq)
> error = drv->pm->thaw_noirq(dev);
>
>
^ permalink raw reply
* Re: [PATCH v3 0/3] Drivers: hv: vmbus: Miscellaneous improvements
From: Wei Liu @ 2019-10-15 14:26 UTC (permalink / raw)
To: Andrea Parri
Cc: Linux Kernel List, Linux on Hyper-V List, K . Y . Srinivasan,
Haiyang Zhang, Stephen Hemminger, Sasha Levin, Michael Kelley,
Vitaly Kuznetsov, Dexuan Cui, Wei Liu
In-Reply-To: <20191015114646.15354-1-parri.andrea@gmail.com>
On Tue, 15 Oct 2019 at 12:47, Andrea Parri <parri.andrea@gmail.com> wrote:
[...]
> [1] https://lkml.kernel.org/r/20191010154600.23875-1-parri.andrea@gmail.com
> [2] https://lkml.kernel.org/r/20191007163115.26197-1-parri.andrea@gmail.com
>
> Andrea Parri (3):
> Drivers: hv: vmbus: Introduce table of VMBus protocol versions
> Drivers: hv: vmbus: Enable VMBus protocol versions 4.1, 5.1 and 5.2
> Drivers: hv: vmbus: Add module parameter to cap the VMBus version
For the whole series:
Reviewed-by: Wei Liu <wei.liu@kernel.org>
^ permalink raw reply
* RE: [PATCH v3 3/3] Drivers: hv: vmbus: Add module parameter to cap the VMBus version
From: Michael Kelley @ 2019-10-15 13:15 UTC (permalink / raw)
To: Andrea Parri, linux-kernel@vger.kernel.org,
linux-hyperv@vger.kernel.org
Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
vkuznets, Dexuan Cui, Wei Liu
In-Reply-To: <20191015114646.15354-4-parri.andrea@gmail.com>
From: Andrea Parri <parri.andrea@gmail.com> Sent: Tuesday, October 15, 2019 4:47 AM
>
> Currently, Linux guests negotiate the VMBus version with Hyper-V
> and use the highest available VMBus version they can connect to.
> This has some drawbacks: by using the highest available version,
> certain code paths are never executed and can not be tested when
> the guest runs on the newest host.
>
> Add the module parameter "max_version", to upper-bound the VMBus
> versions guests can negotiate.
>
> Suggested-by: Dexuan Cui <decui@microsoft.com>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> ---
> drivers/hv/connection.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
^ permalink raw reply
* RE: [PATCH v3 2/3] Drivers: hv: vmbus: Enable VMBus protocol versions 4.1, 5.1 and 5.2
From: Michael Kelley @ 2019-10-15 13:12 UTC (permalink / raw)
To: Andrea Parri, linux-kernel@vger.kernel.org,
linux-hyperv@vger.kernel.org
Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
vkuznets, Dexuan Cui, Wei Liu
In-Reply-To: <20191015114646.15354-3-parri.andrea@gmail.com>
From: Andrea Parri <parri.andrea@gmail.com> Sent: Tuesday, October 15, 2019 4:47 AM
>
> Hyper-V has added VMBus protocol versions 5.1 and 5.2 in recent release
> versions. Allow Linux guests to negotiate these new protocol versions
> on versions of Hyper-V that support them. While on this, also allow
> guests to negotiate the VMBus protocol version 4.1 (which was missing).
>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> ---
> drivers/hv/connection.c | 13 ++++++++-----
> include/linux/hyperv.h | 8 +++++++-
> 2 files changed, 15 insertions(+), 6 deletions(-)
>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
^ permalink raw reply
* RE: [PATCH v3 1/3] Drivers: hv: vmbus: Introduce table of VMBus protocol versions
From: Michael Kelley @ 2019-10-15 13:08 UTC (permalink / raw)
To: Andrea Parri, linux-kernel@vger.kernel.org,
linux-hyperv@vger.kernel.org
Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
vkuznets, Dexuan Cui, Wei Liu
In-Reply-To: <20191015114646.15354-2-parri.andrea@gmail.com>
From: Andrea Parri <parri.andrea@gmail.com> Sent: Tuesday, October 15, 2019 4:47 AM
>
> The technique used to get the next VMBus version seems increasisly
> clumsy as the number of VMBus versions increases. Performance is
> not a concern since this is only done once during system boot; it's
> just that we'll end up with more lines of code than is really needed.
>
> As an alternative, introduce a table with the version numbers listed
> in order (from the most recent to the oldest). vmbus_connect() loops
> through the versions listed in the table until it gets an accepted
> connection or gets to the end of the table (invalid version).
>
> Suggested-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> ---
> drivers/hv/connection.c | 50 +++++++++++++++--------------------------
> drivers/hv/vmbus_drv.c | 3 +--
> include/linux/hyperv.h | 4 ----
> 3 files changed, 19 insertions(+), 38 deletions(-)
>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
^ permalink raw reply
* RE: [PATCH v2] x86/hyperv: Set pv_info.name to "Hyper-V"
From: Michael Kelley @ 2019-10-15 13:06 UTC (permalink / raw)
To: Andrea Parri, linux-kernel@vger.kernel.org,
linux-hyperv@vger.kernel.org, x86@kernel.org
Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
vkuznets, Dexuan Cui, Wei Liu
In-Reply-To: <20191015103502.13156-1-parri.andrea@gmail.com>
From: Andrea Parri <parri.andrea@gmail.com> Sent: Tuesday, October 15, 2019 3:35 AM
>
> Michael reported that the x86/hyperv initialization code printed the
> following dmesg when running in a VM on Hyper-V:
>
> [ 0.000738] Booting paravirtualized kernel on bare hardware
>
> Let the x86/hyperv initialization code set pv_info.name to "Hyper-V";
> with this addition, the dmesg read:
>
> [ 0.000172] Booting paravirtualized kernel on Hyper-V
>
> Reported-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
^ permalink raw reply
* [PATCH v3 3/3] Drivers: hv: vmbus: Add module parameter to cap the VMBus version
From: Andrea Parri @ 2019-10-15 11:46 UTC (permalink / raw)
To: linux-kernel, linux-hyperv
Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
Michael Kelley, Vitaly Kuznetsov, Dexuan Cui, Wei Liu,
Andrea Parri
In-Reply-To: <20191015114646.15354-1-parri.andrea@gmail.com>
Currently, Linux guests negotiate the VMBus version with Hyper-V
and use the highest available VMBus version they can connect to.
This has some drawbacks: by using the highest available version,
certain code paths are never executed and can not be tested when
the guest runs on the newest host.
Add the module parameter "max_version", to upper-bound the VMBus
versions guests can negotiate.
Suggested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
drivers/hv/connection.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index cadfb34b38d80..0475be4356dd7 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -14,6 +14,7 @@
#include <linux/wait.h>
#include <linux/delay.h>
#include <linux/mm.h>
+#include <linux/module.h>
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/hyperv.h>
@@ -55,6 +56,16 @@ static __u32 vmbus_versions[] = {
VERSION_WS2008
};
+/*
+ * Maximal VMBus protocol version guests can negotiate. Useful to cap the
+ * VMBus version for testing and debugging purpose.
+ */
+static uint max_version = VERSION_WIN10_V5_2;
+
+module_param(max_version, uint, S_IRUGO);
+MODULE_PARM_DESC(max_version,
+ "Maximal VMBus protocol version which can be negotiated");
+
int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
{
int ret = 0;
@@ -240,6 +251,8 @@ int vmbus_connect(void)
goto cleanup;
version = vmbus_versions[i];
+ if (version > max_version)
+ continue;
ret = vmbus_negotiate_version(msginfo, version);
if (ret == -ETIMEDOUT)
--
2.23.0
^ permalink raw reply related
* [PATCH v3 2/3] Drivers: hv: vmbus: Enable VMBus protocol versions 4.1, 5.1 and 5.2
From: Andrea Parri @ 2019-10-15 11:46 UTC (permalink / raw)
To: linux-kernel, linux-hyperv
Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
Michael Kelley, Vitaly Kuznetsov, Dexuan Cui, Wei Liu,
Andrea Parri
In-Reply-To: <20191015114646.15354-1-parri.andrea@gmail.com>
Hyper-V has added VMBus protocol versions 5.1 and 5.2 in recent release
versions. Allow Linux guests to negotiate these new protocol versions
on versions of Hyper-V that support them. While on this, also allow
guests to negotiate the VMBus protocol version 4.1 (which was missing).
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
drivers/hv/connection.c | 13 ++++++++-----
include/linux/hyperv.h | 8 +++++++-
2 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 8dc48f53c1ac4..cadfb34b38d80 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -44,7 +44,10 @@ EXPORT_SYMBOL_GPL(vmbus_proto_version);
* Table of VMBus versions listed from newest to oldest.
*/
static __u32 vmbus_versions[] = {
+ VERSION_WIN10_V5_2,
+ VERSION_WIN10_V5_1,
VERSION_WIN10_V5,
+ VERSION_WIN10_V4_1,
VERSION_WIN10,
VERSION_WIN8_1,
VERSION_WIN8,
@@ -68,12 +71,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
msg->vmbus_version_requested = version;
/*
- * VMBus protocol 5.0 (VERSION_WIN10_V5) requires that we must use
- * VMBUS_MESSAGE_CONNECTION_ID_4 for the Initiate Contact Message,
+ * VMBus protocol 5.0 (VERSION_WIN10_V5) and higher require that we must
+ * use VMBUS_MESSAGE_CONNECTION_ID_4 for the Initiate Contact Message,
* and for subsequent messages, we must use the Message Connection ID
* field in the host-returned Version Response Message. And, with
- * VERSION_WIN10_V5, we don't use msg->interrupt_page, but we tell
- * the host explicitly that we still use VMBUS_MESSAGE_SINT(2) for
+ * VERSION_WIN10_V5 and higher, we don't use msg->interrupt_page, but we
+ * tell the host explicitly that we still use VMBUS_MESSAGE_SINT(2) for
* compatibility.
*
* On old hosts, we should always use VMBUS_MESSAGE_CONNECTION_ID (1).
@@ -399,7 +402,7 @@ int vmbus_post_msg(void *buffer, size_t buflen, bool can_sleep)
case HV_STATUS_INVALID_CONNECTION_ID:
/*
* See vmbus_negotiate_version(): VMBus protocol 5.0
- * requires that we must use
+ * and higher require that we must use
* VMBUS_MESSAGE_CONNECTION_ID_4 for the Initiate
* Contact message, but on old hosts that only
* support VMBus protocol 4.0 or lower, here we get
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index c08b62dbd151f..f17f2cd22e39f 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -182,15 +182,21 @@ static inline u32 hv_get_avail_to_write_percent(
* 2 . 4 (Windows 8)
* 3 . 0 (Windows 8 R2)
* 4 . 0 (Windows 10)
+ * 4 . 1 (Windows 10 RS3)
* 5 . 0 (Newer Windows 10)
+ * 5 . 1 (Windows 10 RS4)
+ * 5 . 2 (Windows Server 2019, RS5)
*/
#define VERSION_WS2008 ((0 << 16) | (13))
#define VERSION_WIN7 ((1 << 16) | (1))
#define VERSION_WIN8 ((2 << 16) | (4))
#define VERSION_WIN8_1 ((3 << 16) | (0))
-#define VERSION_WIN10 ((4 << 16) | (0))
+#define VERSION_WIN10 ((4 << 16) | (0))
+#define VERSION_WIN10_V4_1 ((4 << 16) | (1))
#define VERSION_WIN10_V5 ((5 << 16) | (0))
+#define VERSION_WIN10_V5_1 ((5 << 16) | (1))
+#define VERSION_WIN10_V5_2 ((5 << 16) | (2))
/* Make maximum size of pipe payload of 16K */
#define MAX_PIPE_DATA_PAYLOAD (sizeof(u8) * 16384)
--
2.23.0
^ permalink raw reply related
* [PATCH v3 1/3] Drivers: hv: vmbus: Introduce table of VMBus protocol versions
From: Andrea Parri @ 2019-10-15 11:46 UTC (permalink / raw)
To: linux-kernel, linux-hyperv
Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
Michael Kelley, Vitaly Kuznetsov, Dexuan Cui, Wei Liu,
Andrea Parri
In-Reply-To: <20191015114646.15354-1-parri.andrea@gmail.com>
The technique used to get the next VMBus version seems increasisly
clumsy as the number of VMBus versions increases. Performance is
not a concern since this is only done once during system boot; it's
just that we'll end up with more lines of code than is really needed.
As an alternative, introduce a table with the version numbers listed
in order (from the most recent to the oldest). vmbus_connect() loops
through the versions listed in the table until it gets an accepted
connection or gets to the end of the table (invalid version).
Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
drivers/hv/connection.c | 50 +++++++++++++++--------------------------
drivers/hv/vmbus_drv.c | 3 +--
include/linux/hyperv.h | 4 ----
3 files changed, 19 insertions(+), 38 deletions(-)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 6e4c015783ffc..8dc48f53c1ac4 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -40,29 +40,17 @@ EXPORT_SYMBOL_GPL(vmbus_connection);
__u32 vmbus_proto_version;
EXPORT_SYMBOL_GPL(vmbus_proto_version);
-static __u32 vmbus_get_next_version(__u32 current_version)
-{
- switch (current_version) {
- case (VERSION_WIN7):
- return VERSION_WS2008;
-
- case (VERSION_WIN8):
- return VERSION_WIN7;
-
- case (VERSION_WIN8_1):
- return VERSION_WIN8;
-
- case (VERSION_WIN10):
- return VERSION_WIN8_1;
-
- case (VERSION_WIN10_V5):
- return VERSION_WIN10;
-
- case (VERSION_WS2008):
- default:
- return VERSION_INVAL;
- }
-}
+/*
+ * Table of VMBus versions listed from newest to oldest.
+ */
+static __u32 vmbus_versions[] = {
+ VERSION_WIN10_V5,
+ VERSION_WIN10,
+ VERSION_WIN8_1,
+ VERSION_WIN8,
+ VERSION_WIN7,
+ VERSION_WS2008
+};
int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
{
@@ -169,8 +157,8 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
*/
int vmbus_connect(void)
{
- int ret = 0;
struct vmbus_channel_msginfo *msginfo = NULL;
+ int i, ret = 0;
__u32 version;
/* Initialize the vmbus connection */
@@ -244,21 +232,19 @@ int vmbus_connect(void)
* version.
*/
- version = VERSION_CURRENT;
+ for (i = 0; ; i++) {
+ if (i == ARRAY_SIZE(vmbus_versions))
+ goto cleanup;
+
+ version = vmbus_versions[i];
- do {
ret = vmbus_negotiate_version(msginfo, version);
if (ret == -ETIMEDOUT)
goto cleanup;
if (vmbus_connection.conn_state == CONNECTED)
break;
-
- version = vmbus_get_next_version(version);
- } while (version != VERSION_INVAL);
-
- if (version == VERSION_INVAL)
- goto cleanup;
+ }
vmbus_proto_version = version;
pr_info("Vmbus version:%d.%d\n",
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 53a60c81e220d..0ac874faf7209 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2220,8 +2220,7 @@ static int vmbus_bus_resume(struct device *dev)
* We only use the 'vmbus_proto_version', which was in use before
* hibernation, to re-negotiate with the host.
*/
- if (vmbus_proto_version == VERSION_INVAL ||
- vmbus_proto_version == 0) {
+ if (!vmbus_proto_version) {
pr_err("Invalid proto version = 0x%x\n", vmbus_proto_version);
return -EINVAL;
}
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index b4a017093b697..c08b62dbd151f 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -192,10 +192,6 @@ static inline u32 hv_get_avail_to_write_percent(
#define VERSION_WIN10 ((4 << 16) | (0))
#define VERSION_WIN10_V5 ((5 << 16) | (0))
-#define VERSION_INVAL -1
-
-#define VERSION_CURRENT VERSION_WIN10_V5
-
/* Make maximum size of pipe payload of 16K */
#define MAX_PIPE_DATA_PAYLOAD (sizeof(u8) * 16384)
--
2.23.0
^ permalink raw reply related
* [PATCH v3 0/3] Drivers: hv: vmbus: Miscellaneous improvements
From: Andrea Parri @ 2019-10-15 11:46 UTC (permalink / raw)
To: linux-kernel, linux-hyperv
Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
Michael Kelley, Vitaly Kuznetsov, Dexuan Cui, Wei Liu,
Andrea Parri
Hi all,
The patchset:
- refactors the VMBus negotiation code by introducing the table of
VMBus protocol versions (patch 1/3),
- enables VMBus protocol version 4.1, 5.1 and 5.2 (patch 2/3),
- introduces a module parameter to cap the VMBus protocol versions
which a guest can negotiate with the hypervisor (patch 3/3).
Thanks,
Andrea
---
Changes since v2 ([1]):
- refactor the loop exit path in vmbus_connect() (Michael Kelley)
- do not rename VERSION_WIN10 (Michael Kelley)
Changes since v1 ([2]):
- remove the VERSION_INVAL macro (Vitaly Kuznetsov and Dexuan Cui)
- make the table of VMBus protocol versions static (Dexuan Cui)
- enable VMBus protocol version 4.1 (Michael Kelley)
- introduce module parameter to cap the VMBus version (Dexuan Cui)
[1] https://lkml.kernel.org/r/20191010154600.23875-1-parri.andrea@gmail.com
[2] https://lkml.kernel.org/r/20191007163115.26197-1-parri.andrea@gmail.com
Andrea Parri (3):
Drivers: hv: vmbus: Introduce table of VMBus protocol versions
Drivers: hv: vmbus: Enable VMBus protocol versions 4.1, 5.1 and 5.2
Drivers: hv: vmbus: Add module parameter to cap the VMBus version
drivers/hv/connection.c | 72 +++++++++++++++++++++--------------------
drivers/hv/vmbus_drv.c | 3 +-
include/linux/hyperv.h | 12 ++++---
3 files changed, 45 insertions(+), 42 deletions(-)
--
2.23.0
^ permalink raw reply
* Re: [PATCH v2] x86/hyperv: Set pv_info.name to "Hyper-V"
From: Wei Liu @ 2019-10-15 10:56 UTC (permalink / raw)
To: Andrea Parri
Cc: Linux Kernel List, Linux on Hyper-V List, x86, K . Y . Srinivasan,
Haiyang Zhang, Stephen Hemminger, Sasha Levin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, H . Peter Anvin, Michael Kelley,
Vitaly Kuznetsov, Dexuan Cui, Wei Liu
In-Reply-To: <20191015103502.13156-1-parri.andrea@gmail.com>
On Tue, 15 Oct 2019 at 11:35, Andrea Parri <parri.andrea@gmail.com> wrote:
>
> Michael reported that the x86/hyperv initialization code printed the
> following dmesg when running in a VM on Hyper-V:
>
> [ 0.000738] Booting paravirtualized kernel on bare hardware
>
> Let the x86/hyperv initialization code set pv_info.name to "Hyper-V";
> with this addition, the dmesg read:
>
> [ 0.000172] Booting paravirtualized kernel on Hyper-V
>
> Reported-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
^ permalink raw reply
* [PATCH v2] x86/hyperv: Set pv_info.name to "Hyper-V"
From: Andrea Parri @ 2019-10-15 10:35 UTC (permalink / raw)
To: linux-kernel, linux-hyperv, x86
Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
Michael Kelley, Vitaly Kuznetsov, Dexuan Cui, Wei Liu,
Andrea Parri
Michael reported that the x86/hyperv initialization code printed the
following dmesg when running in a VM on Hyper-V:
[ 0.000738] Booting paravirtualized kernel on bare hardware
Let the x86/hyperv initialization code set pv_info.name to "Hyper-V";
with this addition, the dmesg read:
[ 0.000172] Booting paravirtualized kernel on Hyper-V
Reported-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
Changes since v1 ([1]):
- move the setting of pv_info.name to ms_hyperv_init_platform() (Wei Liu)
[1] https://lkml.kernel.org/r/20191015092937.11244-1-parri.andrea@gmail.com
arch/x86/kernel/cpu/mshyperv.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 267daad8c0360..e7f0776e2a811 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -216,6 +216,8 @@ static void __init ms_hyperv_init_platform(void)
int hv_host_info_ecx;
int hv_host_info_edx;
+ pv_info.name = "Hyper-V";
+
/*
* Extract the features and hints
*/
--
2.23.0
^ permalink raw reply related
* Re: [PATCH] x86/hyperv: Set pv_info.name to "Hyper-V"
From: Andrea Parri @ 2019-10-15 10:25 UTC (permalink / raw)
To: Wei Liu
Cc: Linux Kernel List, Linux on Hyper-V List, x86, K . Y . Srinivasan,
Haiyang Zhang, Stephen Hemminger, Sasha Levin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, H . Peter Anvin, Michael Kelley,
Vitaly Kuznetsov, Dexuan Cui
In-Reply-To: <CAHd7Wqxn3sQQWkzOBrJ1KYm8eUpwa_9dcSYRDfPGAMWm=qvbag@mail.gmail.com>
> > @@ -154,6 +154,8 @@ static uint32_t __init ms_hyperv_platform(void)
>
> This function is for platform detection only.
>
> > if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
> > return 0;
> >
> > + pv_info.name = "Hyper-V";
> > +
>
> At this point we're not sure if Linux is really running on Hyper-V yet.
>
> Setting pv_info.name should be moved to the init_platform hook, i.e.
> ms_hyperv_init_platform.
Thank you for the review, Wei. I'll move this to the init_platform hook
and re-submit shortly.
Thanks,
Andrea
^ permalink raw reply
* Re: [PATCH] x86/hyperv: Set pv_info.name to "Hyper-V"
From: Wei Liu @ 2019-10-15 9:39 UTC (permalink / raw)
To: Andrea Parri
Cc: Linux Kernel List, Linux on Hyper-V List, x86, K . Y . Srinivasan,
Haiyang Zhang, Stephen Hemminger, Sasha Levin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, H . Peter Anvin, Michael Kelley,
Vitaly Kuznetsov, Dexuan Cui
In-Reply-To: <20191015092937.11244-1-parri.andrea@gmail.com>
On Tue, 15 Oct 2019 at 10:30, Andrea Parri <parri.andrea@gmail.com> wrote:
>
> Michael reported that the x86/hyperv initialization code printed the
> following dmesg when running in a VM on Hyper-V:
>
> [ 0.000738] Booting paravirtualized kernel on bare hardware
>
> Let the x86/hyperv initialization code set pv_info.name to "Hyper-V";
> with this addition, the dmesg read:
>
> [ 0.000138] Booting paravirtualized kernel on Hyper-V
>
> Reported-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> ---
> arch/x86/kernel/cpu/mshyperv.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 105844d542e5..c7d1801fa88b 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -154,6 +154,8 @@ static uint32_t __init ms_hyperv_platform(void)
This function is for platform detection only.
> if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
> return 0;
>
> + pv_info.name = "Hyper-V";
> +
At this point we're not sure if Linux is really running on Hyper-V yet.
Setting pv_info.name should be moved to the init_platform hook, i.e.
ms_hyperv_init_platform.
Wei.
> cpuid(HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS,
> &eax, &hyp_signature[0], &hyp_signature[1], &hyp_signature[2]);
>
> --
> 2.17.1
>
^ permalink raw reply
* [PATCH] x86/hyperv: Set pv_info.name to "Hyper-V"
From: Andrea Parri @ 2019-10-15 9:29 UTC (permalink / raw)
To: linux-kernel, linux-hyperv, x86
Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
Michael Kelley, Vitaly Kuznetsov, Dexuan Cui, Andrea Parri
Michael reported that the x86/hyperv initialization code printed the
following dmesg when running in a VM on Hyper-V:
[ 0.000738] Booting paravirtualized kernel on bare hardware
Let the x86/hyperv initialization code set pv_info.name to "Hyper-V";
with this addition, the dmesg read:
[ 0.000138] Booting paravirtualized kernel on Hyper-V
Reported-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
arch/x86/kernel/cpu/mshyperv.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 105844d542e5..c7d1801fa88b 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -154,6 +154,8 @@ static uint32_t __init ms_hyperv_platform(void)
if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
return 0;
+ pv_info.name = "Hyper-V";
+
cpuid(HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS,
&eax, &hyp_signature[0], &hyp_signature[1], &hyp_signature[2]);
--
2.17.1
^ permalink raw reply related
* [PATCH v6 1/5] Revert "KVM: X86: Fix setup the virt_spin_lock_key before static key get initialized"
From: Zhenzhong Duan @ 2019-10-15 1:19 UTC (permalink / raw)
To: linux-kernel
Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
peterz, Zhenzhong Duan, H. Peter Anvin
In-Reply-To: <1571102367-31595-1-git-send-email-zhenzhong.duan@oracle.com>
This reverts commit 34226b6b70980a8f81fff3c09a2c889f77edeeff.
Commit 8990cac6e5ea ("x86/jump_label: Initialize static branching
early") adds jump_label_init() call in setup_arch() to make static
keys initialized early, so we could use the original simpler code
again.
The similar change for XEN is in commit 090d54bcbc54 ("Revert
"x86/paravirt: Set up the virt_spin_lock_key after static keys get
initialized"")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
arch/x86/kernel/kvm.c | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e820568..3bc6a266 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -527,13 +527,6 @@ static void kvm_smp_send_call_func_ipi(const struct cpumask *mask)
}
}
-static void __init kvm_smp_prepare_cpus(unsigned int max_cpus)
-{
- native_smp_prepare_cpus(max_cpus);
- if (kvm_para_has_hint(KVM_HINTS_REALTIME))
- static_branch_disable(&virt_spin_lock_key);
-}
-
static void __init kvm_smp_prepare_boot_cpu(void)
{
/*
@@ -633,7 +626,6 @@ static void __init kvm_guest_init(void)
apic_set_eoi_write(kvm_guest_apic_eoi_write);
#ifdef CONFIG_SMP
- smp_ops.smp_prepare_cpus = kvm_smp_prepare_cpus;
smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
if (kvm_para_has_feature(KVM_FEATURE_PV_SCHED_YIELD) &&
!kvm_para_has_hint(KVM_HINTS_REALTIME) &&
@@ -835,8 +827,10 @@ void __init kvm_spinlock_init(void)
if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
return;
- if (kvm_para_has_hint(KVM_HINTS_REALTIME))
+ if (kvm_para_has_hint(KVM_HINTS_REALTIME)) {
+ static_branch_disable(&virt_spin_lock_key);
return;
+ }
/* Don't use the pvqspinlock code if there is only 1 vCPU. */
if (num_possible_cpus() == 1)
--
1.8.3.1
^ permalink raw reply related
* [PATCH v6 5/5] x86/hyperv: Mark "hv_nopvspin" parameter obsolete
From: Zhenzhong Duan @ 2019-10-15 1:19 UTC (permalink / raw)
To: linux-kernel
Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
peterz, Zhenzhong Duan, Jonathan Corbet, H. Peter Anvin
In-Reply-To: <1571102367-31595-1-git-send-email-zhenzhong.duan@oracle.com>
Map "hv_nopvspin" to "nopvspin".
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
Documentation/admin-guide/kernel-parameters.txt | 6 +++++-
arch/x86/hyperv/hv_spinlock.c | 4 ++++
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 85059dd..78648bb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1436,6 +1436,10 @@
hv_nopvspin [X86,HYPER_V] Disables the paravirt spinlock optimizations
which allow the hypervisor to 'idle' the
guest on lock contention.
+ This parameter is obsoleted by "nopvspin"
+ parameter, which has equivalent effect for
+ HYPER_V platform.
+
keep_bootcon [KNL]
Do not unregister boot console at start. This is only
@@ -5335,7 +5339,7 @@
as generic guest with no PV drivers. Currently support
XEN HVM, KVM, HYPER_V and VMWARE guest.
- nopvspin [X86,XEN,KVM]
+ nopvspin [X86,XEN,KVM,HYPER_V]
Disables the qspinlock slow path using PV optimizations
which allow the hypervisor to 'idle' the guest on lock
contention.
diff --git a/arch/x86/hyperv/hv_spinlock.c b/arch/x86/hyperv/hv_spinlock.c
index 07f21a0..47c7d6c 100644
--- a/arch/x86/hyperv/hv_spinlock.c
+++ b/arch/x86/hyperv/hv_spinlock.c
@@ -64,6 +64,9 @@ __visible bool hv_vcpu_is_preempted(int vcpu)
void __init hv_init_spinlocks(void)
{
+ if (nopvspin)
+ hv_pvspin = false;
+
if (!hv_pvspin || !apic ||
!(ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) ||
!(ms_hyperv.features & HV_X64_MSR_GUEST_IDLE_AVAILABLE)) {
@@ -82,6 +85,7 @@ void __init hv_init_spinlocks(void)
static __init int hv_parse_nopvspin(char *arg)
{
+ pr_notice("\"hv_nopvspin\" is deprecated, please use \"nopvspin\" instead\n");
hv_pvspin = false;
return 0;
}
--
1.8.3.1
^ permalink raw reply related
* [PATCH v6 3/5] x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
From: Zhenzhong Duan @ 2019-10-15 1:19 UTC (permalink / raw)
To: linux-kernel
Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
peterz, Zhenzhong Duan, Jonathan Corbet, H. Peter Anvin,
Will Deacon
In-Reply-To: <1571102367-31595-1-git-send-email-zhenzhong.duan@oracle.com>
There are cases where a guest tries to switch spinlocks to bare metal
behavior (e.g. by setting "xen_nopvspin" on XEN platform and
"hv_nopvspin" on HYPER_V).
That feature is missed on KVM, add a new parameter "nopvspin" to disable
PV spinlocks for KVM guest.
The new 'nopvspin' parameter will also replace Xen and Hyper-V specific
parameters in future patches.
Define variable nopvsin as global because it will be used in future
patches as above.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
---
Documentation/admin-guide/kernel-parameters.txt | 5 ++++
arch/x86/include/asm/qspinlock.h | 1 +
arch/x86/kernel/kvm.c | 34 ++++++++++++++++++++++---
kernel/locking/qspinlock.c | 7 +++++
4 files changed, 43 insertions(+), 4 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a84a83f..bd49ed2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5334,6 +5334,11 @@
as generic guest with no PV drivers. Currently support
XEN HVM, KVM, HYPER_V and VMWARE guest.
+ nopvspin [X86,KVM]
+ Disables the qspinlock slow path using PV optimizations
+ which allow the hypervisor to 'idle' the guest on lock
+ contention.
+
xirc2ps_cs= [NET,PCMCIA]
Format:
<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 444d6fd..d86ab94 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -32,6 +32,7 @@ static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lo
extern void __pv_init_lock_hash(void);
extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
extern void __raw_callee_save___pv_queued_spin_unlock(struct qspinlock *lock);
+extern bool nopvspin;
#define queued_spin_unlock queued_spin_unlock
/**
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 249f14a..e9c76d8 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -825,18 +825,44 @@ __visible bool __kvm_vcpu_is_preempted(long cpu)
*/
void __init kvm_spinlock_init(void)
{
- /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
- if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
+ /*
+ * PV spinlocks is disabled if no host side support, then native
+ * qspinlock will be used. As native qspinlock is a fair lock, there is
+ * lock holder preemption issue using it in a guest, imaging one pCPU
+ * running 10 vCPUs of same guest contending same lock.
+ *
+ * virt_spin_lock() is introduced as an optimization for that scenario
+ * which is enabled by virt_spin_lock_key key. To use that optimization,
+ * virt_spin_lock_key isn't disabled here.
+ */
+ if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) {
+ pr_info("PV spinlocks disabled, no host support.\n");
return;
+ }
+ /*
+ * Disable PV qspinlock and use native qspinlock when dedicated pCPUs
+ * are available.
+ */
if (kvm_para_has_hint(KVM_HINTS_REALTIME)) {
+ pr_info("PV spinlocks disabled with KVM_HINTS_REALTIME hints.\n");
+ static_branch_disable(&virt_spin_lock_key);
+ return;
+ }
+
+ if (num_possible_cpus() == 1) {
+ pr_info("PV spinlocks disabled, single CPU.\n");
static_branch_disable(&virt_spin_lock_key);
return;
}
- /* Don't use the pvqspinlock code if there is only 1 vCPU. */
- if (num_possible_cpus() == 1)
+ if (nopvspin) {
+ pr_info("PV spinlocks disabled, forced by \"nopvspin\" parameter.\n");
+ static_branch_disable(&virt_spin_lock_key);
return;
+ }
+
+ pr_info("PV spinlocks enabled\n");
__pv_init_lock_hash();
pv_ops.lock.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 2473f10..75193d6 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -580,4 +580,11 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
#include "qspinlock_paravirt.h"
#include "qspinlock.c"
+bool nopvspin __initdata;
+static __init int parse_nopvspin(char *arg)
+{
+ nopvspin = true;
+ return 0;
+}
+early_param("nopvspin", parse_nopvspin);
#endif
--
1.8.3.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox