Linux-HyperV List
 help / color / mirror / Atom feed
* RE: [PATCH v3 0/7] PCI: PM: Move to D0 before calling pci_legacy_resume_early()
From: Dexuan Cui @ 2019-10-15 18:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
	Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
	marcelo.cerri@canonical.com, jackm@mellanox.com,
	linux-pci@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-1-helgaas@kernel.org>

> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Monday, October 14, 2019 4:00 PM
>  ...
> 
> Dexuan, the important thing here is the first patch, which is your [1],
> which I modified by doing pci_restore_state() as well as setting to D0:
> 
>   pci_set_power_state(pci_dev, PCI_D0);
>   pci_restore_state(pci_dev);
> 
> I'm proposing some more patches on top.  None are relevant to the problem
> you're solving; they're just minor doc and other updates in the same area.
> 
> Rafael, if you have a chance to look at these, I'd appreciate it.  I tried
> to make the doc match the code, but I'm no PM expert.
 
Thank you very much, Bjorn! The patchset looks good to me.

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH 7/7] PCI/MSI: Move power state check out of pci_msi_supported()
From: Rafael J. Wysocki @ 2019-10-15 17:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Dexuan Cui, Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
	Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
	linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-8-helgaas@kernel.org>

On Tuesday, October 15, 2019 1:00:16 AM CEST Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> 27e20603c54b ("PCI/MSI: Move D0 check into pci_msi_check_device()")
> moved the power state check into pci_msi_check_device(), which was
> subsequently renamed to pci_msi_supported().  This didn't change the
> behavior, since both callers checked the power state.
> 
> However, it doesn't fit the current "pci_msi_supported()" name, which
> should return what the device is capable of, independent of the power
> state.
> 
> Move the power state check back into the callers for readability.  No
> functional change intended.
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

No issues found, so

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/pci/msi.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 0884bedcfc7a..20e9c729617c 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -861,7 +861,7 @@ static int pci_msi_supported(struct pci_dev *dev, int nvec)
>  	if (!pci_msi_enable)
>  		return 0;
>  
> -	if (!dev || dev->no_msi || dev->current_state != PCI_D0)
> +	if (!dev || dev->no_msi)
>  		return 0;
>  
>  	/*
> @@ -972,7 +972,7 @@ static int __pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries,
>  	int nr_entries;
>  	int i, j;
>  
> -	if (!pci_msi_supported(dev, nvec))
> +	if (!pci_msi_supported(dev, nvec) || dev->current_state != PCI_D0)
>  		return -EINVAL;
>  
>  	nr_entries = pci_msix_vec_count(dev);
> @@ -1058,7 +1058,7 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
>  	int nvec;
>  	int rc;
>  
> -	if (!pci_msi_supported(dev, minvec))
> +	if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0)
>  		return -EINVAL;
>  
>  	/* Check whether driver already requested MSI-X IRQs */
> 





^ permalink raw reply

* Re: [PATCH 6/7] PCI/PM: Wrap long lines in documentation
From: Rafael J. Wysocki @ 2019-10-15 17:23 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Dexuan Cui, Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
	Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
	linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-7-helgaas@kernel.org>

On Tuesday, October 15, 2019 1:00:15 AM CEST Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> Documentation/power/pci.rst is wrapped to fit in 80 columns, but directory
> structure changes made a few lines longer.  Wrap them so they all fit in 80
> columns again.
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Well, looks better this way. :-)

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  Documentation/power/pci.rst | 28 ++++++++++++++--------------
>  1 file changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst
> index 1525c594d631..db41a770a2f5 100644
> --- a/Documentation/power/pci.rst
> +++ b/Documentation/power/pci.rst
> @@ -426,12 +426,12 @@ pm->runtime_idle() callback.
>  2.4. System-Wide Power Transitions
>  ----------------------------------
>  There are a few different types of system-wide power transitions, described in
> -Documentation/driver-api/pm/devices.rst.  Each of them requires devices to be handled
> -in a specific way and the PM core executes subsystem-level power management
> -callbacks for this purpose.  They are executed in phases such that each phase
> -involves executing the same subsystem-level callback for every device belonging
> -to the given subsystem before the next phase begins.  These phases always run
> -after tasks have been frozen.
> +Documentation/driver-api/pm/devices.rst.  Each of them requires devices to be
> +handled in a specific way and the PM core executes subsystem-level power
> +management callbacks for this purpose.  They are executed in phases such that
> +each phase involves executing the same subsystem-level callback for every device
> +belonging to the given subsystem before the next phase begins.  These phases
> +always run after tasks have been frozen.
>  
>  2.4.1. System Suspend
>  ^^^^^^^^^^^^^^^^^^^^^
> @@ -636,12 +636,12 @@ System restore requires a hibernation image to be loaded into memory and the
>  pre-hibernation memory contents to be restored before the pre-hibernation system
>  activity can be resumed.
>  
> -As described in Documentation/driver-api/pm/devices.rst, the hibernation image is loaded
> -into memory by a fresh instance of the kernel, called the boot kernel, which in
> -turn is loaded and run by a boot loader in the usual way.  After the boot kernel
> -has loaded the image, it needs to replace its own code and data with the code
> -and data of the "hibernated" kernel stored within the image, called the image
> -kernel.  For this purpose all devices are frozen just like before creating
> +As described in Documentation/driver-api/pm/devices.rst, the hibernation image
> +is loaded into memory by a fresh instance of the kernel, called the boot kernel,
> +which in turn is loaded and run by a boot loader in the usual way.  After the
> +boot kernel has loaded the image, it needs to replace its own code and data with
> +the code and data of the "hibernated" kernel stored within the image, called the
> +image kernel.  For this purpose all devices are frozen just like before creating
>  the image during hibernation, in the
>  
>  	prepare, freeze, freeze_noirq
> @@ -691,8 +691,8 @@ controlling the runtime power management of their devices.
>  
>  At the time of this writing there are two ways to define power management
>  callbacks for a PCI device driver, the recommended one, based on using a
> -dev_pm_ops structure described in Documentation/driver-api/pm/devices.rst, and the
> -"legacy" one, in which the .suspend(), .suspend_late(), .resume_early(), and
> +dev_pm_ops structure described in Documentation/driver-api/pm/devices.rst, and
> +the "legacy" one, in which the .suspend(), .suspend_late(), .resume_early(), and
>  .resume() callbacks from struct pci_driver are used.  The legacy approach,
>  however, doesn't allow one to define runtime power management callbacks and is
>  not really suitable for any new drivers.  Therefore it is not covered by this
> 





^ permalink raw reply

* Re: [PATCH 5/7] PCI/PM: Make power management op coding style consistent
From: Rafael J. Wysocki @ 2019-10-15 17:22 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Dexuan Cui, Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
	Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
	linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-6-helgaas@kernel.org>

On Tuesday, October 15, 2019 1:00:14 AM CEST Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> Some of the power management ops use this style:
> 
>   struct device_driver *drv = dev->driver;
>   if (drv && drv->pm && drv->pm->prepare(dev))
>     drv->pm->prepare(dev);
> 
> while others use this:
> 
>   const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>   if (pm && pm->runtime_resume)
>     pm->runtime_resume(dev);
> 
> Convert the first style to the second so they're all consistent.  Remove
> local "error" variables when unnecessary.  No functional change intended.
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

No issues found, so

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/pci/pci-driver.c | 76 +++++++++++++++++++---------------------
>  1 file changed, 36 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 55acb658273f..abbf5c39cb9c 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -679,11 +679,11 @@ static bool pci_has_legacy_pm_support(struct pci_dev *pci_dev)
>  
>  static int pci_pm_prepare(struct device *dev)
>  {
> -	struct device_driver *drv = dev->driver;
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
> +	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>  
> -	if (drv && drv->pm && drv->pm->prepare) {
> -		int error = drv->pm->prepare(dev);
> +	if (pm && pm->prepare) {
> +		int error = pm->prepare(dev);
>  		if (error < 0)
>  			return error;
>  
> @@ -917,8 +917,7 @@ static int pci_pm_suspend_noirq(struct device *dev)
>  static int pci_pm_resume_noirq(struct device *dev)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
> -	struct device_driver *drv = dev->driver;
> -	int error = 0;
> +	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>  
>  	if (dev_pm_may_skip_resume(dev))
>  		return 0;
> @@ -946,17 +945,16 @@ static int pci_pm_resume_noirq(struct device *dev)
>  	if (pci_has_legacy_pm_support(pci_dev))
>  		return pci_legacy_resume_early(dev);
>  
> -	if (drv && drv->pm && drv->pm->resume_noirq)
> -		error = drv->pm->resume_noirq(dev);
> +	if (pm && pm->resume_noirq)
> +		return pm->resume_noirq(dev);
>  
> -	return error;
> +	return 0;
>  }
>  
>  static int pci_pm_resume(struct device *dev)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
>  	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> -	int error = 0;
>  
>  	/*
>  	 * This is necessary for the suspend error path in which resume is
> @@ -972,12 +970,12 @@ static int pci_pm_resume(struct device *dev)
>  
>  	if (pm) {
>  		if (pm->resume)
> -			error = pm->resume(dev);
> +			return pm->resume(dev);
>  	} else {
>  		pci_pm_reenable_device(pci_dev);
>  	}
>  
> -	return error;
> +	return 0;
>  }
>  
>  #else /* !CONFIG_SUSPEND */
> @@ -1038,16 +1036,16 @@ static int pci_pm_freeze(struct device *dev)
>  static int pci_pm_freeze_noirq(struct device *dev)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
> -	struct device_driver *drv = dev->driver;
> +	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>  
>  	if (pci_has_legacy_pm_support(pci_dev))
>  		return pci_legacy_suspend_late(dev, PMSG_FREEZE);
>  
> -	if (drv && drv->pm && drv->pm->freeze_noirq) {
> +	if (pm && pm->freeze_noirq) {
>  		int error;
>  
> -		error = drv->pm->freeze_noirq(dev);
> -		suspend_report_result(drv->pm->freeze_noirq, error);
> +		error = pm->freeze_noirq(dev);
> +		suspend_report_result(pm->freeze_noirq, error);
>  		if (error)
>  			return error;
>  	}
> @@ -1066,8 +1064,8 @@ static int pci_pm_freeze_noirq(struct device *dev)
>  static int pci_pm_thaw_noirq(struct device *dev)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
> -	struct device_driver *drv = dev->driver;
> -	int error = 0;
> +	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +	int error;
>  
>  	if (pcibios_pm_ops.thaw_noirq) {
>  		error = pcibios_pm_ops.thaw_noirq(dev);
> @@ -1091,10 +1089,10 @@ static int pci_pm_thaw_noirq(struct device *dev)
>  	if (pci_has_legacy_pm_support(pci_dev))
>  		return pci_legacy_resume_early(dev);
>  
> -	if (drv && drv->pm && drv->pm->thaw_noirq)
> -		error = drv->pm->thaw_noirq(dev);
> +	if (pm && pm->thaw_noirq)
> +		return pm->thaw_noirq(dev);
>  
> -	return error;
> +	return 0;
>  }
>  
>  static int pci_pm_thaw(struct device *dev)
> @@ -1165,24 +1163,24 @@ static int pci_pm_poweroff_late(struct device *dev)
>  static int pci_pm_poweroff_noirq(struct device *dev)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
> -	struct device_driver *drv = dev->driver;
> +	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>  
>  	if (dev_pm_smart_suspend_and_suspended(dev))
>  		return 0;
>  
> -	if (pci_has_legacy_pm_support(to_pci_dev(dev)))
> +	if (pci_has_legacy_pm_support(pci_dev))
>  		return pci_legacy_suspend_late(dev, PMSG_HIBERNATE);
>  
> -	if (!drv || !drv->pm) {
> +	if (!pm) {
>  		pci_fixup_device(pci_fixup_suspend_late, pci_dev);
>  		return 0;
>  	}
>  
> -	if (drv->pm->poweroff_noirq) {
> +	if (pm->poweroff_noirq) {
>  		int error;
>  
> -		error = drv->pm->poweroff_noirq(dev);
> -		suspend_report_result(drv->pm->poweroff_noirq, error);
> +		error = pm->poweroff_noirq(dev);
> +		suspend_report_result(pm->poweroff_noirq, error);
>  		if (error)
>  			return error;
>  	}
> @@ -1208,8 +1206,8 @@ static int pci_pm_poweroff_noirq(struct device *dev)
>  static int pci_pm_restore_noirq(struct device *dev)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
> -	struct device_driver *drv = dev->driver;
> -	int error = 0;
> +	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +	int error;
>  
>  	if (pcibios_pm_ops.restore_noirq) {
>  		error = pcibios_pm_ops.restore_noirq(dev);
> @@ -1223,17 +1221,16 @@ static int pci_pm_restore_noirq(struct device *dev)
>  	if (pci_has_legacy_pm_support(pci_dev))
>  		return pci_legacy_resume_early(dev);
>  
> -	if (drv && drv->pm && drv->pm->restore_noirq)
> -		error = drv->pm->restore_noirq(dev);
> +	if (pm && pm->restore_noirq)
> +		return pm->restore_noirq(dev);
>  
> -	return error;
> +	return 0;
>  }
>  
>  static int pci_pm_restore(struct device *dev)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
>  	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> -	int error = 0;
>  
>  	/*
>  	 * This is necessary for the hibernation error path in which restore is
> @@ -1249,12 +1246,12 @@ static int pci_pm_restore(struct device *dev)
>  
>  	if (pm) {
>  		if (pm->restore)
> -			error = pm->restore(dev);
> +			return pm->restore(dev);
>  	} else {
>  		pci_pm_reenable_device(pci_dev);
>  	}
>  
> -	return error;
> +	return 0;
>  }
>  
>  #else /* !CONFIG_HIBERNATE_CALLBACKS */
> @@ -1330,9 +1327,9 @@ static int pci_pm_runtime_suspend(struct device *dev)
>  
>  static int pci_pm_runtime_resume(struct device *dev)
>  {
> -	int rc = 0;
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
>  	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +	int error = 0;
>  
>  	/*
>  	 * Restoring config space is necessary even if the device is not bound
> @@ -1348,18 +1345,17 @@ static int pci_pm_runtime_resume(struct device *dev)
>  	pci_pm_default_resume(pci_dev);
>  
>  	if (pm && pm->runtime_resume)
> -		rc = pm->runtime_resume(dev);
> +		error = pm->runtime_resume(dev);
>  
>  	pci_dev->runtime_d3cold = false;
>  
> -	return rc;
> +	return error;
>  }
>  
>  static int pci_pm_runtime_idle(struct device *dev)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
>  	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> -	int ret = 0;
>  
>  	/*
>  	 * If pci_dev->driver is not set (unbound), the device should
> @@ -1372,9 +1368,9 @@ static int pci_pm_runtime_idle(struct device *dev)
>  		return -ENOSYS;
>  
>  	if (pm->runtime_idle)
> -		ret = pm->runtime_idle(dev);
> +		return pm->runtime_idle(dev);
>  
> -	return ret;
> +	return 0;
>  }
>  
>  static const struct dev_pm_ops pci_dev_pm_ops = {
> 





^ permalink raw reply

* Re: [PATCH 4/7] PCI/PM: Run resume fixups before disabling wakeup events
From: Rafael J. Wysocki @ 2019-10-15 17:21 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Dexuan Cui, Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
	Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
	linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-5-helgaas@kernel.org>

On Tuesday, October 15, 2019 1:00:13 AM CEST Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> pci_pm_resume() and pci_pm_restore() call pci_pm_default_resume(), which
> runs resume fixups before disabling wakeup events:
> 
>   static void pci_pm_default_resume(struct pci_dev *pci_dev)
>   {
>     pci_fixup_device(pci_fixup_resume, pci_dev);
>     pci_enable_wake(pci_dev, PCI_D0, false);
>   }
> 
> pci_pm_runtime_resume() does both of these, but in the opposite order:
> 
>   pci_enable_wake(pci_dev, PCI_D0, false);
>   pci_fixup_device(pci_fixup_resume, pci_dev);
> 
> We should always use the same ordering unless there's a reason to do
> otherwise.

Right.

> Change pci_pm_runtime_resume() to call pci_pm_default_resume()
> instead of open-coding this, so the fixups are always done before disabling
> wakeup events.
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

No concerns about this change, so

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/pci/pci-driver.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 0c3086793e4e..55acb658273f 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1345,8 +1345,7 @@ static int pci_pm_runtime_resume(struct device *dev)
>  		return 0;
>  
>  	pci_fixup_device(pci_fixup_resume_early, pci_dev);
> -	pci_enable_wake(pci_dev, PCI_D0, false);
> -	pci_fixup_device(pci_fixup_resume, pci_dev);
> +	pci_pm_default_resume(pci_dev);
>  
>  	if (pm && pm->runtime_resume)
>  		rc = pm->runtime_resume(dev);
> 





^ permalink raw reply

* Re: [PATCH 3/7] PCI/PM: Clear PCIe PME Status even for legacy power management
From: Rafael J. Wysocki @ 2019-10-15 17:20 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Dexuan Cui, Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
	Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
	linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-4-helgaas@kernel.org>

On Tuesday, October 15, 2019 1:00:12 AM CEST Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> Previously, pci_pm_resume_noirq() cleared the PME Status bit in the Root
> Status register only if the device had no driver or the driver did not
> implement legacy power management.  It should clear PME Status regardless
> of what sort of power management the driver supports, so do this before
> checking for legacy power management.
> 
> This affects Root Ports and Root Complex Event Collectors, for which the
> usual driver is the PCIe portdrv, which implements new power management, so
> this change is just on principle, not to fix any actual defects.
> 
> Fixes: a39bd851dccf ("PCI/PM: Clear PCIe PME Status bit in core, not PCIe port driver")
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

This is a reasonable change in my view, so

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/pci/pci-driver.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index d4ac8ce8c1f9..0c3086793e4e 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -941,12 +941,11 @@ static int pci_pm_resume_noirq(struct device *dev)
>  		pci_pm_default_resume_early(pci_dev);
>  
>  	pci_fixup_device(pci_fixup_resume_early, pci_dev);
> +	pcie_pme_root_status_cleanup(pci_dev);
>  
>  	if (pci_has_legacy_pm_support(pci_dev))
>  		return pci_legacy_resume_early(dev);
>  
> -	pcie_pme_root_status_cleanup(pci_dev);
> -
>  	if (drv && drv->pm && drv->pm->resume_noirq)
>  		error = drv->pm->resume_noirq(dev);
>  
> 





^ permalink raw reply

* Re: [PATCH 2/7] PCI/PM: Correct pci_pm_thaw_noirq() documentation
From: Rafael J. Wysocki @ 2019-10-15 17:17 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Dexuan Cui, Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
	Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
	linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-3-helgaas@kernel.org>

On Tuesday, October 15, 2019 1:00:11 AM CEST Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> According to the documentation, pci_pm_thaw_noirq() did not put the device
> into the full-power state and restore its standard configuration registers.
> This is incorrect, so update the documentation to match the code.
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Right, the documentation is outdated, so

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  Documentation/power/pci.rst | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst
> index 0e2ef7429304..1525c594d631 100644
> --- a/Documentation/power/pci.rst
> +++ b/Documentation/power/pci.rst
> @@ -600,17 +600,17 @@ using the following PCI bus type's callbacks::
>  
>  respectively.
>  
> -The first of them, pci_pm_thaw_noirq(), is analogous to pci_pm_resume_noirq(),
> -but it doesn't put the device into the full power state and doesn't attempt to
> -restore its standard configuration registers.  It also executes the device
> -driver's pm->thaw_noirq() callback, if defined, instead of pm->resume_noirq().
> +The first of them, pci_pm_thaw_noirq(), is analogous to pci_pm_resume_noirq().
> +It puts the device into the full power state and restores its standard
> +configuration registers.  It also executes the device driver's pm->thaw_noirq()
> +callback, if defined, instead of pm->resume_noirq().
>  
>  The pci_pm_thaw() routine is similar to pci_pm_resume(), but it runs the device
>  driver's pm->thaw() callback instead of pm->resume().  It is executed
>  asynchronously for different PCI devices that don't depend on each other in a
>  known way.
>  
> -The complete phase it the same as for system resume.
> +The complete phase is the same as for system resume.
>  
>  After saving the image, devices need to be powered down before the system can
>  enter the target sleep state (ACPI S4 for ACPI-based systems).  This is done in
> 





^ permalink raw reply

* Re: [PATCH 1/7] PCI/PM: Always return devices to D0 when thawing
From: Rafael J. Wysocki @ 2019-10-15 17:15 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Dexuan Cui, Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
	Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
	linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas,
	Linux PM
In-Reply-To: <20191014230016.240912-2-helgaas@kernel.org>

On Tuesday, October 15, 2019 1:00:10 AM CEST Bjorn Helgaas wrote:
> From: Dexuan Cui <decui@microsoft.com>
> 
> pci_pm_thaw_noirq() is supposed to return the device to D0 and restore its
> configuration registers, but previously it only did that for devices whose
> drivers implemented the new power management ops.
> 
> Hibernation, e.g., via "echo disk > /sys/power/state", involves freezing
> devices, creating a hibernation image, thawing devices, writing the image,
> and powering off.  The fact that thawing did not return devices with legacy
> power management to D0 caused errors, e.g., in this path:
> 
>   pci_pm_thaw_noirq
>     if (pci_has_legacy_pm_support(pci_dev)) # true for Mellanox VF driver
>       return pci_legacy_resume_early(dev)   # ... legacy PM skips the rest
>     pci_set_power_state(pci_dev, PCI_D0)
>     pci_restore_state(pci_dev)
>   pci_pm_thaw
>     if (pci_has_legacy_pm_support(pci_dev))
>       pci_legacy_resume
> 	drv->resume
> 	  mlx4_resume
> 	    ...
> 	      pci_enable_msix_range
> 	        ...
> 		  if (dev->current_state != PCI_D0)  # <---
> 		    return -EINVAL;
> 
> which caused these warnings:
> 
>   mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting
>   PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
>   PM: Device a6d1:00:02.0 failed to thaw: error -95
> 
> Return devices to D0 and restore config registers for all devices, not just
> those whose drivers support new power management.
> 
> [bhelgaas: also call pci_restore_state() before pci_legacy_resume_early(),
> update comment, add stable tag, commit log]
> Link: https://lore.kernel.org/r/KU1P153MB016637CAEAD346F0AA8E3801BFAD0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COM
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> Cc: stable@vger.kernel.org	# v4.13+

No issues found, so

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/pci/pci-driver.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index a8124e47bf6e..d4ac8ce8c1f9 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1076,17 +1076,22 @@ static int pci_pm_thaw_noirq(struct device *dev)
>  			return error;
>  	}
>  
> -	if (pci_has_legacy_pm_support(pci_dev))
> -		return pci_legacy_resume_early(dev);
> -
>  	/*
> -	 * pci_restore_state() requires the device to be in D0 (because of MSI
> -	 * restoration among other things), so force it into D0 in case the
> -	 * driver's "freeze" callbacks put it into a low-power state directly.
> +	 * Both the legacy ->resume_early() and the new pm->thaw_noirq()
> +	 * callbacks assume the device has been returned to D0 and its
> +	 * config state has been restored.
> +	 *
> +	 * In addition, pci_restore_state() restores MSI-X state in MMIO
> +	 * space, which requires the device to be in D0, so return it to D0
> +	 * in case the driver's "freeze" callbacks put it into a low-power
> +	 * state.
>  	 */
>  	pci_set_power_state(pci_dev, PCI_D0);
>  	pci_restore_state(pci_dev);
>  
> +	if (pci_has_legacy_pm_support(pci_dev))
> +		return pci_legacy_resume_early(dev);
> +
>  	if (drv && drv->pm && drv->pm->thaw_noirq)
>  		error = drv->pm->thaw_noirq(dev);
>  
> 





^ permalink raw reply

* Re: [PATCH v3 0/3] Drivers: hv: vmbus: Miscellaneous improvements
From: Wei Liu @ 2019-10-15 14:26 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Linux Kernel List, Linux on Hyper-V List, K . Y . Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Sasha Levin, Michael Kelley,
	Vitaly Kuznetsov, Dexuan Cui, Wei Liu
In-Reply-To: <20191015114646.15354-1-parri.andrea@gmail.com>

On Tue, 15 Oct 2019 at 12:47, Andrea Parri <parri.andrea@gmail.com> wrote:
[...]
> [1] https://lkml.kernel.org/r/20191010154600.23875-1-parri.andrea@gmail.com
> [2] https://lkml.kernel.org/r/20191007163115.26197-1-parri.andrea@gmail.com
>
> Andrea Parri (3):
>   Drivers: hv: vmbus: Introduce table of VMBus protocol versions
>   Drivers: hv: vmbus: Enable VMBus protocol versions 4.1, 5.1 and 5.2
>   Drivers: hv: vmbus: Add module parameter to cap the VMBus version

For the whole series:

Reviewed-by: Wei Liu <wei.liu@kernel.org>

^ permalink raw reply

* RE: [PATCH v3 3/3] Drivers: hv: vmbus: Add module parameter to cap the VMBus version
From: Michael Kelley @ 2019-10-15 13:15 UTC (permalink / raw)
  To: Andrea Parri, linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	vkuznets, Dexuan Cui, Wei Liu
In-Reply-To: <20191015114646.15354-4-parri.andrea@gmail.com>

From: Andrea Parri <parri.andrea@gmail.com> Sent: Tuesday, October 15, 2019 4:47 AM
> 
> Currently, Linux guests negotiate the VMBus version with Hyper-V
> and use the highest available VMBus version they can connect to.
> This has some drawbacks: by using the highest available version,
> certain code paths are never executed and can not be tested when
> the guest runs on the newest host.
> 
> Add the module parameter "max_version", to upper-bound the VMBus
> versions guests can negotiate.
> 
> Suggested-by: Dexuan Cui <decui@microsoft.com>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> ---
>  drivers/hv/connection.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply

* RE: [PATCH v3 2/3] Drivers: hv: vmbus: Enable VMBus protocol versions 4.1, 5.1 and 5.2
From: Michael Kelley @ 2019-10-15 13:12 UTC (permalink / raw)
  To: Andrea Parri, linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	vkuznets, Dexuan Cui, Wei Liu
In-Reply-To: <20191015114646.15354-3-parri.andrea@gmail.com>

From: Andrea Parri <parri.andrea@gmail.com> Sent: Tuesday, October 15, 2019 4:47 AM
> 
> Hyper-V has added VMBus protocol versions 5.1 and 5.2 in recent release
> versions.  Allow Linux guests to negotiate these new protocol versions
> on versions of Hyper-V that support them.  While on this, also allow
> guests to negotiate the VMBus protocol version 4.1 (which was missing).
> 
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> ---
>  drivers/hv/connection.c | 13 ++++++++-----
>  include/linux/hyperv.h  |  8 +++++++-
>  2 files changed, 15 insertions(+), 6 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply

* RE: [PATCH v3 1/3] Drivers: hv: vmbus: Introduce table of VMBus protocol versions
From: Michael Kelley @ 2019-10-15 13:08 UTC (permalink / raw)
  To: Andrea Parri, linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	vkuznets, Dexuan Cui, Wei Liu
In-Reply-To: <20191015114646.15354-2-parri.andrea@gmail.com>

From: Andrea Parri <parri.andrea@gmail.com> Sent: Tuesday, October 15, 2019 4:47 AM
> 
> The technique used to get the next VMBus version seems increasisly
> clumsy as the number of VMBus versions increases.  Performance is
> not a concern since this is only done once during system boot; it's
> just that we'll end up with more lines of code than is really needed.
> 
> As an alternative, introduce a table with the version numbers listed
> in order (from the most recent to the oldest).  vmbus_connect() loops
> through the versions listed in the table until it gets an accepted
> connection or gets to the end of the table (invalid version).
> 
> Suggested-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> ---
>  drivers/hv/connection.c | 50 +++++++++++++++--------------------------
>  drivers/hv/vmbus_drv.c  |  3 +--
>  include/linux/hyperv.h  |  4 ----
>  3 files changed, 19 insertions(+), 38 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply

* RE: [PATCH v2] x86/hyperv: Set pv_info.name to "Hyper-V"
From: Michael Kelley @ 2019-10-15 13:06 UTC (permalink / raw)
  To: Andrea Parri, linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org, x86@kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	vkuznets, Dexuan Cui, Wei Liu
In-Reply-To: <20191015103502.13156-1-parri.andrea@gmail.com>

From: Andrea Parri <parri.andrea@gmail.com> Sent: Tuesday, October 15, 2019 3:35 AM
> 
> Michael reported that the x86/hyperv initialization code printed the
> following dmesg when running in a VM on Hyper-V:
> 
>   [    0.000738] Booting paravirtualized kernel on bare hardware
> 
> Let the x86/hyperv initialization code set pv_info.name to "Hyper-V";
> with this addition, the dmesg read:
> 
>   [    0.000172] Booting paravirtualized kernel on Hyper-V
> 
> Reported-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply

* [PATCH v3 3/3] Drivers: hv: vmbus: Add module parameter to cap the VMBus version
From: Andrea Parri @ 2019-10-15 11:46 UTC (permalink / raw)
  To: linux-kernel, linux-hyperv
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Michael Kelley, Vitaly Kuznetsov, Dexuan Cui, Wei Liu,
	Andrea Parri
In-Reply-To: <20191015114646.15354-1-parri.andrea@gmail.com>

Currently, Linux guests negotiate the VMBus version with Hyper-V
and use the highest available VMBus version they can connect to.
This has some drawbacks: by using the highest available version,
certain code paths are never executed and can not be tested when
the guest runs on the newest host.

Add the module parameter "max_version", to upper-bound the VMBus
versions guests can negotiate.

Suggested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
 drivers/hv/connection.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index cadfb34b38d80..0475be4356dd7 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -14,6 +14,7 @@
 #include <linux/wait.h>
 #include <linux/delay.h>
 #include <linux/mm.h>
+#include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/hyperv.h>
@@ -55,6 +56,16 @@ static __u32 vmbus_versions[] = {
 	VERSION_WS2008
 };
 
+/*
+ * Maximal VMBus protocol version guests can negotiate.  Useful to cap the
+ * VMBus version for testing and debugging purpose.
+ */
+static uint max_version = VERSION_WIN10_V5_2;
+
+module_param(max_version, uint, S_IRUGO);
+MODULE_PARM_DESC(max_version,
+		 "Maximal VMBus protocol version which can be negotiated");
+
 int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 {
 	int ret = 0;
@@ -240,6 +251,8 @@ int vmbus_connect(void)
 			goto cleanup;
 
 		version = vmbus_versions[i];
+		if (version > max_version)
+			continue;
 
 		ret = vmbus_negotiate_version(msginfo, version);
 		if (ret == -ETIMEDOUT)
-- 
2.23.0


^ permalink raw reply related

* [PATCH v3 2/3] Drivers: hv: vmbus: Enable VMBus protocol versions 4.1, 5.1 and 5.2
From: Andrea Parri @ 2019-10-15 11:46 UTC (permalink / raw)
  To: linux-kernel, linux-hyperv
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Michael Kelley, Vitaly Kuznetsov, Dexuan Cui, Wei Liu,
	Andrea Parri
In-Reply-To: <20191015114646.15354-1-parri.andrea@gmail.com>

Hyper-V has added VMBus protocol versions 5.1 and 5.2 in recent release
versions.  Allow Linux guests to negotiate these new protocol versions
on versions of Hyper-V that support them.  While on this, also allow
guests to negotiate the VMBus protocol version 4.1 (which was missing).

Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
 drivers/hv/connection.c | 13 ++++++++-----
 include/linux/hyperv.h  |  8 +++++++-
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 8dc48f53c1ac4..cadfb34b38d80 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -44,7 +44,10 @@ EXPORT_SYMBOL_GPL(vmbus_proto_version);
  * Table of VMBus versions listed from newest to oldest.
  */
 static __u32 vmbus_versions[] = {
+	VERSION_WIN10_V5_2,
+	VERSION_WIN10_V5_1,
 	VERSION_WIN10_V5,
+	VERSION_WIN10_V4_1,
 	VERSION_WIN10,
 	VERSION_WIN8_1,
 	VERSION_WIN8,
@@ -68,12 +71,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 	msg->vmbus_version_requested = version;
 
 	/*
-	 * VMBus protocol 5.0 (VERSION_WIN10_V5) requires that we must use
-	 * VMBUS_MESSAGE_CONNECTION_ID_4 for the Initiate Contact Message,
+	 * VMBus protocol 5.0 (VERSION_WIN10_V5) and higher require that we must
+	 * use VMBUS_MESSAGE_CONNECTION_ID_4 for the Initiate Contact Message,
 	 * and for subsequent messages, we must use the Message Connection ID
 	 * field in the host-returned Version Response Message. And, with
-	 * VERSION_WIN10_V5, we don't use msg->interrupt_page, but we tell
-	 * the host explicitly that we still use VMBUS_MESSAGE_SINT(2) for
+	 * VERSION_WIN10_V5 and higher, we don't use msg->interrupt_page, but we
+	 * tell the host explicitly that we still use VMBUS_MESSAGE_SINT(2) for
 	 * compatibility.
 	 *
 	 * On old hosts, we should always use VMBUS_MESSAGE_CONNECTION_ID (1).
@@ -399,7 +402,7 @@ int vmbus_post_msg(void *buffer, size_t buflen, bool can_sleep)
 		case HV_STATUS_INVALID_CONNECTION_ID:
 			/*
 			 * See vmbus_negotiate_version(): VMBus protocol 5.0
-			 * requires that we must use
+			 * and higher require that we must use
 			 * VMBUS_MESSAGE_CONNECTION_ID_4 for the Initiate
 			 * Contact message, but on old hosts that only
 			 * support VMBus protocol 4.0 or lower, here we get
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index c08b62dbd151f..f17f2cd22e39f 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -182,15 +182,21 @@ static inline u32 hv_get_avail_to_write_percent(
  * 2 . 4  (Windows 8)
  * 3 . 0  (Windows 8 R2)
  * 4 . 0  (Windows 10)
+ * 4 . 1  (Windows 10 RS3)
  * 5 . 0  (Newer Windows 10)
+ * 5 . 1  (Windows 10 RS4)
+ * 5 . 2  (Windows Server 2019, RS5)
  */
 
 #define VERSION_WS2008  ((0 << 16) | (13))
 #define VERSION_WIN7    ((1 << 16) | (1))
 #define VERSION_WIN8    ((2 << 16) | (4))
 #define VERSION_WIN8_1    ((3 << 16) | (0))
-#define VERSION_WIN10	((4 << 16) | (0))
+#define VERSION_WIN10 ((4 << 16) | (0))
+#define VERSION_WIN10_V4_1 ((4 << 16) | (1))
 #define VERSION_WIN10_V5 ((5 << 16) | (0))
+#define VERSION_WIN10_V5_1 ((5 << 16) | (1))
+#define VERSION_WIN10_V5_2 ((5 << 16) | (2))
 
 /* Make maximum size of pipe payload of 16K */
 #define MAX_PIPE_DATA_PAYLOAD		(sizeof(u8) * 16384)
-- 
2.23.0


^ permalink raw reply related

* [PATCH v3 1/3] Drivers: hv: vmbus: Introduce table of VMBus protocol versions
From: Andrea Parri @ 2019-10-15 11:46 UTC (permalink / raw)
  To: linux-kernel, linux-hyperv
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Michael Kelley, Vitaly Kuznetsov, Dexuan Cui, Wei Liu,
	Andrea Parri
In-Reply-To: <20191015114646.15354-1-parri.andrea@gmail.com>

The technique used to get the next VMBus version seems increasisly
clumsy as the number of VMBus versions increases.  Performance is
not a concern since this is only done once during system boot; it's
just that we'll end up with more lines of code than is really needed.

As an alternative, introduce a table with the version numbers listed
in order (from the most recent to the oldest).  vmbus_connect() loops
through the versions listed in the table until it gets an accepted
connection or gets to the end of the table (invalid version).

Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
 drivers/hv/connection.c | 50 +++++++++++++++--------------------------
 drivers/hv/vmbus_drv.c  |  3 +--
 include/linux/hyperv.h  |  4 ----
 3 files changed, 19 insertions(+), 38 deletions(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 6e4c015783ffc..8dc48f53c1ac4 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -40,29 +40,17 @@ EXPORT_SYMBOL_GPL(vmbus_connection);
 __u32 vmbus_proto_version;
 EXPORT_SYMBOL_GPL(vmbus_proto_version);
 
-static __u32 vmbus_get_next_version(__u32 current_version)
-{
-	switch (current_version) {
-	case (VERSION_WIN7):
-		return VERSION_WS2008;
-
-	case (VERSION_WIN8):
-		return VERSION_WIN7;
-
-	case (VERSION_WIN8_1):
-		return VERSION_WIN8;
-
-	case (VERSION_WIN10):
-		return VERSION_WIN8_1;
-
-	case (VERSION_WIN10_V5):
-		return VERSION_WIN10;
-
-	case (VERSION_WS2008):
-	default:
-		return VERSION_INVAL;
-	}
-}
+/*
+ * Table of VMBus versions listed from newest to oldest.
+ */
+static __u32 vmbus_versions[] = {
+	VERSION_WIN10_V5,
+	VERSION_WIN10,
+	VERSION_WIN8_1,
+	VERSION_WIN8,
+	VERSION_WIN7,
+	VERSION_WS2008
+};
 
 int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 {
@@ -169,8 +157,8 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
  */
 int vmbus_connect(void)
 {
-	int ret = 0;
 	struct vmbus_channel_msginfo *msginfo = NULL;
+	int i, ret = 0;
 	__u32 version;
 
 	/* Initialize the vmbus connection */
@@ -244,21 +232,19 @@ int vmbus_connect(void)
 	 * version.
 	 */
 
-	version = VERSION_CURRENT;
+	for (i = 0; ; i++) {
+		if (i == ARRAY_SIZE(vmbus_versions))
+			goto cleanup;
+
+		version = vmbus_versions[i];
 
-	do {
 		ret = vmbus_negotiate_version(msginfo, version);
 		if (ret == -ETIMEDOUT)
 			goto cleanup;
 
 		if (vmbus_connection.conn_state == CONNECTED)
 			break;
-
-		version = vmbus_get_next_version(version);
-	} while (version != VERSION_INVAL);
-
-	if (version == VERSION_INVAL)
-		goto cleanup;
+	}
 
 	vmbus_proto_version = version;
 	pr_info("Vmbus version:%d.%d\n",
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 53a60c81e220d..0ac874faf7209 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2220,8 +2220,7 @@ static int vmbus_bus_resume(struct device *dev)
 	 * We only use the 'vmbus_proto_version', which was in use before
 	 * hibernation, to re-negotiate with the host.
 	 */
-	if (vmbus_proto_version == VERSION_INVAL ||
-	    vmbus_proto_version == 0) {
+	if (!vmbus_proto_version) {
 		pr_err("Invalid proto version = 0x%x\n", vmbus_proto_version);
 		return -EINVAL;
 	}
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index b4a017093b697..c08b62dbd151f 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -192,10 +192,6 @@ static inline u32 hv_get_avail_to_write_percent(
 #define VERSION_WIN10	((4 << 16) | (0))
 #define VERSION_WIN10_V5 ((5 << 16) | (0))
 
-#define VERSION_INVAL -1
-
-#define VERSION_CURRENT VERSION_WIN10_V5
-
 /* Make maximum size of pipe payload of 16K */
 #define MAX_PIPE_DATA_PAYLOAD		(sizeof(u8) * 16384)
 
-- 
2.23.0


^ permalink raw reply related

* [PATCH v3 0/3] Drivers: hv: vmbus: Miscellaneous improvements
From: Andrea Parri @ 2019-10-15 11:46 UTC (permalink / raw)
  To: linux-kernel, linux-hyperv
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Michael Kelley, Vitaly Kuznetsov, Dexuan Cui, Wei Liu,
	Andrea Parri

Hi all,

The patchset:

  - refactors the VMBus negotiation code by introducing the table of
    VMBus protocol versions (patch 1/3),

  - enables VMBus protocol version 4.1, 5.1 and 5.2 (patch 2/3),

  - introduces a module parameter to cap the VMBus protocol versions
    which a guest can negotiate with the hypervisor (patch 3/3).

Thanks,
  Andrea

---
Changes since v2 ([1]):
  - refactor the loop exit path in vmbus_connect() (Michael Kelley)
  - do not rename VERSION_WIN10 (Michael Kelley)

Changes since v1 ([2]):
  - remove the VERSION_INVAL macro (Vitaly Kuznetsov and Dexuan Cui)
  - make the table of VMBus protocol versions static (Dexuan Cui)
  - enable VMBus protocol version 4.1 (Michael Kelley)
  - introduce module parameter to cap the VMBus version (Dexuan Cui)

[1] https://lkml.kernel.org/r/20191010154600.23875-1-parri.andrea@gmail.com
[2] https://lkml.kernel.org/r/20191007163115.26197-1-parri.andrea@gmail.com

Andrea Parri (3):
  Drivers: hv: vmbus: Introduce table of VMBus protocol versions
  Drivers: hv: vmbus: Enable VMBus protocol versions 4.1, 5.1 and 5.2
  Drivers: hv: vmbus: Add module parameter to cap the VMBus version

 drivers/hv/connection.c | 72 +++++++++++++++++++++--------------------
 drivers/hv/vmbus_drv.c  |  3 +-
 include/linux/hyperv.h  | 12 ++++---
 3 files changed, 45 insertions(+), 42 deletions(-)

-- 
2.23.0


^ permalink raw reply

* Re: [PATCH v2] x86/hyperv: Set pv_info.name to "Hyper-V"
From: Wei Liu @ 2019-10-15 10:56 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Linux Kernel List, Linux on Hyper-V List, x86, K . Y . Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Sasha Levin, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H . Peter Anvin, Michael Kelley,
	Vitaly Kuznetsov, Dexuan Cui, Wei Liu
In-Reply-To: <20191015103502.13156-1-parri.andrea@gmail.com>

On Tue, 15 Oct 2019 at 11:35, Andrea Parri <parri.andrea@gmail.com> wrote:
>
> Michael reported that the x86/hyperv initialization code printed the
> following dmesg when running in a VM on Hyper-V:
>
>   [    0.000738] Booting paravirtualized kernel on bare hardware
>
> Let the x86/hyperv initialization code set pv_info.name to "Hyper-V";
> with this addition, the dmesg read:
>
>   [    0.000172] Booting paravirtualized kernel on Hyper-V
>
> Reported-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>

Reviewed-by: Wei Liu <wei.liu@kernel.org>

^ permalink raw reply

* [PATCH v2] x86/hyperv: Set pv_info.name to "Hyper-V"
From: Andrea Parri @ 2019-10-15 10:35 UTC (permalink / raw)
  To: linux-kernel, linux-hyperv, x86
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Michael Kelley, Vitaly Kuznetsov, Dexuan Cui, Wei Liu,
	Andrea Parri

Michael reported that the x86/hyperv initialization code printed the
following dmesg when running in a VM on Hyper-V:

  [    0.000738] Booting paravirtualized kernel on bare hardware

Let the x86/hyperv initialization code set pv_info.name to "Hyper-V";
with this addition, the dmesg read:

  [    0.000172] Booting paravirtualized kernel on Hyper-V

Reported-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
Changes since v1 ([1]):
  - move the setting of pv_info.name to ms_hyperv_init_platform() (Wei Liu)

[1] https://lkml.kernel.org/r/20191015092937.11244-1-parri.andrea@gmail.com

 arch/x86/kernel/cpu/mshyperv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 267daad8c0360..e7f0776e2a811 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -216,6 +216,8 @@ static void __init ms_hyperv_init_platform(void)
 	int hv_host_info_ecx;
 	int hv_host_info_edx;
 
+	pv_info.name = "Hyper-V";
+
 	/*
 	 * Extract the features and hints
 	 */
-- 
2.23.0


^ permalink raw reply related

* Re: [PATCH] x86/hyperv: Set pv_info.name to "Hyper-V"
From: Andrea Parri @ 2019-10-15 10:25 UTC (permalink / raw)
  To: Wei Liu
  Cc: Linux Kernel List, Linux on Hyper-V List, x86, K . Y . Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Sasha Levin, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H . Peter Anvin, Michael Kelley,
	Vitaly Kuznetsov, Dexuan Cui
In-Reply-To: <CAHd7Wqxn3sQQWkzOBrJ1KYm8eUpwa_9dcSYRDfPGAMWm=qvbag@mail.gmail.com>

> > @@ -154,6 +154,8 @@ static uint32_t  __init ms_hyperv_platform(void)
> 
> This function is for platform detection only.
> 
> >         if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
> >                 return 0;
> >
> > +       pv_info.name = "Hyper-V";
> > +
> 
> At this point we're not sure if Linux is really running on Hyper-V yet.
> 
> Setting pv_info.name should be moved to the init_platform hook, i.e.
> ms_hyperv_init_platform.

Thank you for the review, Wei.  I'll move this to the init_platform hook
and re-submit shortly.

Thanks,
  Andrea

^ permalink raw reply

* Re: [PATCH] x86/hyperv: Set pv_info.name to "Hyper-V"
From: Wei Liu @ 2019-10-15  9:39 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Linux Kernel List, Linux on Hyper-V List, x86, K . Y . Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Sasha Levin, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H . Peter Anvin, Michael Kelley,
	Vitaly Kuznetsov, Dexuan Cui
In-Reply-To: <20191015092937.11244-1-parri.andrea@gmail.com>

On Tue, 15 Oct 2019 at 10:30, Andrea Parri <parri.andrea@gmail.com> wrote:
>
> Michael reported that the x86/hyperv initialization code printed the
> following dmesg when running in a VM on Hyper-V:
>
>   [    0.000738] Booting paravirtualized kernel on bare hardware
>
> Let the x86/hyperv initialization code set pv_info.name to "Hyper-V";
> with this addition, the dmesg read:
>
>   [    0.000138] Booting paravirtualized kernel on Hyper-V
>
> Reported-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> ---
>  arch/x86/kernel/cpu/mshyperv.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 105844d542e5..c7d1801fa88b 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -154,6 +154,8 @@ static uint32_t  __init ms_hyperv_platform(void)

This function is for platform detection only.

>         if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
>                 return 0;
>
> +       pv_info.name = "Hyper-V";
> +

At this point we're not sure if Linux is really running on Hyper-V yet.

Setting pv_info.name should be moved to the init_platform hook, i.e.
ms_hyperv_init_platform.

Wei.

>         cpuid(HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS,
>               &eax, &hyp_signature[0], &hyp_signature[1], &hyp_signature[2]);
>
> --
> 2.17.1
>

^ permalink raw reply

* [PATCH] x86/hyperv: Set pv_info.name to "Hyper-V"
From: Andrea Parri @ 2019-10-15  9:29 UTC (permalink / raw)
  To: linux-kernel, linux-hyperv, x86
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Michael Kelley, Vitaly Kuznetsov, Dexuan Cui, Andrea Parri

Michael reported that the x86/hyperv initialization code printed the
following dmesg when running in a VM on Hyper-V:

  [    0.000738] Booting paravirtualized kernel on bare hardware

Let the x86/hyperv initialization code set pv_info.name to "Hyper-V";
with this addition, the dmesg read:

  [    0.000138] Booting paravirtualized kernel on Hyper-V

Reported-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
 arch/x86/kernel/cpu/mshyperv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 105844d542e5..c7d1801fa88b 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -154,6 +154,8 @@ static uint32_t  __init ms_hyperv_platform(void)
 	if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
 		return 0;
 
+	pv_info.name = "Hyper-V";
+
 	cpuid(HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS,
 	      &eax, &hyp_signature[0], &hyp_signature[1], &hyp_signature[2]);
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH v6 1/5] Revert "KVM: X86: Fix setup the virt_spin_lock_key before static key get initialized"
From: Zhenzhong Duan @ 2019-10-15  1:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
	wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
	peterz, Zhenzhong Duan, H. Peter Anvin
In-Reply-To: <1571102367-31595-1-git-send-email-zhenzhong.duan@oracle.com>

This reverts commit 34226b6b70980a8f81fff3c09a2c889f77edeeff.

Commit 8990cac6e5ea ("x86/jump_label: Initialize static branching
early") adds jump_label_init() call in setup_arch() to make static
keys initialized early, so we could use the original simpler code
again.

The similar change for XEN is in commit 090d54bcbc54 ("Revert
"x86/paravirt: Set up the virt_spin_lock_key after static keys get
initialized"")

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/kernel/kvm.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e820568..3bc6a266 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -527,13 +527,6 @@ static void kvm_smp_send_call_func_ipi(const struct cpumask *mask)
 	}
 }
 
-static void __init kvm_smp_prepare_cpus(unsigned int max_cpus)
-{
-	native_smp_prepare_cpus(max_cpus);
-	if (kvm_para_has_hint(KVM_HINTS_REALTIME))
-		static_branch_disable(&virt_spin_lock_key);
-}
-
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
 	/*
@@ -633,7 +626,6 @@ static void __init kvm_guest_init(void)
 		apic_set_eoi_write(kvm_guest_apic_eoi_write);
 
 #ifdef CONFIG_SMP
-	smp_ops.smp_prepare_cpus = kvm_smp_prepare_cpus;
 	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
 	if (kvm_para_has_feature(KVM_FEATURE_PV_SCHED_YIELD) &&
 	    !kvm_para_has_hint(KVM_HINTS_REALTIME) &&
@@ -835,8 +827,10 @@ void __init kvm_spinlock_init(void)
 	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
 		return;
 
-	if (kvm_para_has_hint(KVM_HINTS_REALTIME))
+	if (kvm_para_has_hint(KVM_HINTS_REALTIME)) {
+		static_branch_disable(&virt_spin_lock_key);
 		return;
+	}
 
 	/* Don't use the pvqspinlock code if there is only 1 vCPU. */
 	if (num_possible_cpus() == 1)
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH v6 5/5] x86/hyperv: Mark "hv_nopvspin" parameter obsolete
From: Zhenzhong Duan @ 2019-10-15  1:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
	wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
	peterz, Zhenzhong Duan, Jonathan Corbet, H. Peter Anvin
In-Reply-To: <1571102367-31595-1-git-send-email-zhenzhong.duan@oracle.com>

Map "hv_nopvspin" to "nopvspin".

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 6 +++++-
 arch/x86/hyperv/hv_spinlock.c                   | 4 ++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 85059dd..78648bb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1436,6 +1436,10 @@
 	hv_nopvspin	[X86,HYPER_V] Disables the paravirt spinlock optimizations
 				      which allow the hypervisor to 'idle' the
 				      guest on lock contention.
+				      This parameter is obsoleted by "nopvspin"
+				      parameter, which has equivalent effect for
+				      HYPER_V platform.
+
 
 	keep_bootcon	[KNL]
 			Do not unregister boot console at start. This is only
@@ -5335,7 +5339,7 @@
 			as generic guest with no PV drivers. Currently support
 			XEN HVM, KVM, HYPER_V and VMWARE guest.
 
-	nopvspin	[X86,XEN,KVM]
+	nopvspin	[X86,XEN,KVM,HYPER_V]
 			Disables the qspinlock slow path using PV optimizations
 			which allow the hypervisor to 'idle' the guest on lock
 			contention.
diff --git a/arch/x86/hyperv/hv_spinlock.c b/arch/x86/hyperv/hv_spinlock.c
index 07f21a0..47c7d6c 100644
--- a/arch/x86/hyperv/hv_spinlock.c
+++ b/arch/x86/hyperv/hv_spinlock.c
@@ -64,6 +64,9 @@ __visible bool hv_vcpu_is_preempted(int vcpu)
 
 void __init hv_init_spinlocks(void)
 {
+	if (nopvspin)
+		hv_pvspin = false;
+
 	if (!hv_pvspin || !apic ||
 	    !(ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) ||
 	    !(ms_hyperv.features & HV_X64_MSR_GUEST_IDLE_AVAILABLE)) {
@@ -82,6 +85,7 @@ void __init hv_init_spinlocks(void)
 
 static __init int hv_parse_nopvspin(char *arg)
 {
+	pr_notice("\"hv_nopvspin\" is deprecated, please use \"nopvspin\" instead\n");
 	hv_pvspin = false;
 	return 0;
 }
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH v6 3/5] x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
From: Zhenzhong Duan @ 2019-10-15  1:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
	wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
	peterz, Zhenzhong Duan, Jonathan Corbet, H. Peter Anvin,
	Will Deacon
In-Reply-To: <1571102367-31595-1-git-send-email-zhenzhong.duan@oracle.com>

There are cases where a guest tries to switch spinlocks to bare metal
behavior (e.g. by setting "xen_nopvspin" on XEN platform and
"hv_nopvspin" on HYPER_V).

That feature is missed on KVM, add a new parameter "nopvspin" to disable
PV spinlocks for KVM guest.

The new 'nopvspin' parameter will also replace Xen and Hyper-V specific
parameters in future patches.

Define variable nopvsin as global because it will be used in future
patches as above.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt |  5 ++++
 arch/x86/include/asm/qspinlock.h                |  1 +
 arch/x86/kernel/kvm.c                           | 34 ++++++++++++++++++++++---
 kernel/locking/qspinlock.c                      |  7 +++++
 4 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a84a83f..bd49ed2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5334,6 +5334,11 @@
 			as generic guest with no PV drivers. Currently support
 			XEN HVM, KVM, HYPER_V and VMWARE guest.
 
+	nopvspin	[X86,KVM]
+			Disables the qspinlock slow path using PV optimizations
+			which allow the hypervisor to 'idle' the guest on lock
+			contention.
+
 	xirc2ps_cs=	[NET,PCMCIA]
 			Format:
 			<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 444d6fd..d86ab94 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -32,6 +32,7 @@ static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lo
 extern void __pv_init_lock_hash(void);
 extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
 extern void __raw_callee_save___pv_queued_spin_unlock(struct qspinlock *lock);
+extern bool nopvspin;
 
 #define	queued_spin_unlock queued_spin_unlock
 /**
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 249f14a..e9c76d8 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -825,18 +825,44 @@ __visible bool __kvm_vcpu_is_preempted(long cpu)
  */
 void __init kvm_spinlock_init(void)
 {
-	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
-	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
+	/*
+	 * PV spinlocks is disabled if no host side support, then native
+	 * qspinlock will be used. As native qspinlock is a fair lock, there is
+	 * lock holder preemption issue using it in a guest, imaging one pCPU
+	 * running 10 vCPUs of same guest contending same lock.
+	 *
+	 * virt_spin_lock() is introduced as an optimization for that scenario
+	 * which is enabled by virt_spin_lock_key key. To use that optimization,
+	 * virt_spin_lock_key isn't disabled here.
+	 */
+	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) {
+		pr_info("PV spinlocks disabled, no host support.\n");
 		return;
+	}
 
+	/*
+	 * Disable PV qspinlock and use native qspinlock when dedicated pCPUs
+	 * are available.
+	 */
 	if (kvm_para_has_hint(KVM_HINTS_REALTIME)) {
+		pr_info("PV spinlocks disabled with KVM_HINTS_REALTIME hints.\n");
+		static_branch_disable(&virt_spin_lock_key);
+		return;
+	}
+
+	if (num_possible_cpus() == 1) {
+		pr_info("PV spinlocks disabled, single CPU.\n");
 		static_branch_disable(&virt_spin_lock_key);
 		return;
 	}
 
-	/* Don't use the pvqspinlock code if there is only 1 vCPU. */
-	if (num_possible_cpus() == 1)
+	if (nopvspin) {
+		pr_info("PV spinlocks disabled, forced by \"nopvspin\" parameter.\n");
+		static_branch_disable(&virt_spin_lock_key);
 		return;
+	}
+
+	pr_info("PV spinlocks enabled\n");
 
 	__pv_init_lock_hash();
 	pv_ops.lock.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 2473f10..75193d6 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -580,4 +580,11 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 #include "qspinlock_paravirt.h"
 #include "qspinlock.c"
 
+bool nopvspin __initdata;
+static __init int parse_nopvspin(char *arg)
+{
+	nopvspin = true;
+	return 0;
+}
+early_param("nopvspin", parse_nopvspin);
 #endif
-- 
1.8.3.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox