Linux-HyperV List

Linux-HyperV List
 help / color / mirror / Atom feed

* [PATCH v2] scsi: storvsc: Add the support of hibernation
From: Dexuan Cui @ 2019-09-25 20:11 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, jejb@linux.ibm.com, martin.petersen@oracle.com,
	linux-hyperv@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
  Cc: Dexuan Cui

When we're in storvsc_suspend(), we're sure the SCSI layer has quiesced the
scsi device by scsi_bus_suspend() -> ... -> scsi_device_quiesce(), so the
low level SCSI adapter driver only needs to suspend/resume its own state.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>

---

This patch is basically a pure Hyper-V specific change and it has a
build dependency on the commit 271b2224d42f ("Drivers: hv: vmbus: Implement
suspend/resume for VSC drivers for hibernation"), which is on Sasha Levin's
Hyper-V tree's hyperv-next branch:
https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/log/?h=hyperv-next

I request this patch should go through Sasha's tree rather than the
SCSI tree.

In v2:
	Added Acked-by from Martin.

	@Sasha: I think the patch can go through the hyper-v tree, since
                we have the Ack from Martin. :-)

	No other change.


 drivers/scsi/storvsc_drv.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index ed8b9ac..9fbf604 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1727,6 +1727,13 @@ enum {
 
 MODULE_DEVICE_TABLE(vmbus, id_table);
 
+static const struct { guid_t guid; } fc_guid = { HV_SYNTHFC_GUID };
+
+static bool hv_dev_is_fc(struct hv_device *hv_dev)
+{
+	return guid_equal(&fc_guid.guid, &hv_dev->dev_type);
+}
+
 static int storvsc_probe(struct hv_device *device,
 			const struct hv_vmbus_device_id *dev_id)
 {
@@ -1935,11 +1942,45 @@ static int storvsc_remove(struct hv_device *dev)
 	return 0;
 }
 
+static int storvsc_suspend(struct hv_device *hv_dev)
+{
+	struct storvsc_device *stor_device = hv_get_drvdata(hv_dev);
+	struct Scsi_Host *host = stor_device->host;
+	struct hv_host_device *host_dev = shost_priv(host);
+
+	storvsc_wait_to_drain(stor_device);
+
+	drain_workqueue(host_dev->handle_error_wq);
+
+	vmbus_close(hv_dev->channel);
+
+	memset(stor_device->stor_chns, 0,
+	       num_possible_cpus() * sizeof(void *));
+
+	kfree(stor_device->stor_chns);
+	stor_device->stor_chns = NULL;
+
+	cpumask_clear(&stor_device->alloced_cpus);
+
+	return 0;
+}
+
+static int storvsc_resume(struct hv_device *hv_dev)
+{
+	int ret;
+
+	ret = storvsc_connect_to_vsp(hv_dev, storvsc_ringbuffer_size,
+				     hv_dev_is_fc(hv_dev));
+	return ret;
+}
+
 static struct hv_driver storvsc_drv = {
 	.name = KBUILD_MODNAME,
 	.id_table = id_table,
 	.probe = storvsc_probe,
 	.remove = storvsc_remove,
+	.suspend = storvsc_suspend,
+	.resume = storvsc_resume,
 	.driver = {
 		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
 	},
-- 
1.8.3.1


^ permalink raw reply related

* RE: [PATCH] hv_balloon: Add the support of hibernation
From: Dexuan Cui @ 2019-09-25 20:03 UTC (permalink / raw)
  To: Dexuan Cui, David Hildenbrand, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, sashal@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	Michael Kelley
In-Reply-To: <PU1P153MB01699AB87526B16F7AB94045BFB20@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

> From: linux-hyperv-owner@vger.kernel.org
>  [... snipped ...]
> > Anyhow, just some comments from my side :) I can see how Windows Server
> > worked around that issue right now by just XOR'ing both features.
> >
> > David / dhildenb
> 
> Thanks for sharing your thoughts!
> 
> -- Dexuan

Hi David,
If my explanation sounds good to you, can I have an Acked-by from you? 

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH 0/4] Enhance pci-hyperv to support hibernation
From: Dexuan Cui @ 2019-09-25 19:58 UTC (permalink / raw)
  To: Dexuan Cui, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, lorenzo.pieralisi@arm.com, bhelgaas@google.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <1568245086-70601-1-git-send-email-decui@microsoft.com>

> From: Dexuan Cui <decui@microsoft.com>
> Sent: Wednesday, September 11, 2019 4:38 PM
> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; sashal@kernel.org; lorenzo.pieralisi@arm.com;
> bhelgaas@google.com; linux-hyperv@vger.kernel.org;
> linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Michael Kelley
> <mikelley@microsoft.com>
> Cc: Dexuan Cui <decui@microsoft.com>
> Subject: [PATCH 0/4] Enhance pci-hyperv to support hibernation
> 
> This patchset is basically a pure Hyper-V specific change and it has a
> build dependency on the commit 271b2224d42f ("Drivers: hv: vmbus:
> Implement
> suspend/resume for VSC drivers for hibernation"), which is on Sasha Levin's
> Hyper-V tree's hyperv-next branch: [... snipped ...]
> 
> I request this patch should go through Sasha's tree rather than the
> pci tree.
> 
> Dexuan Cui (4):
>   PCI: hv: Reorganize the code in preparation of hibernation
>   PCI: hv: Add the support of hibernation
>   PCI: hv: Do not queue new work items on hibernation
>   PCI: hv: Change pci_protocol_version to per-hbus
> 
>  drivers/pci/controller/pci-hyperv.c | 166
> ++++++++++++++++++++++++++++++------
>  1 file changed, 140 insertions(+), 26 deletions(-)

Hi Lorenzo, Bjorn, and all,

Can you please take a look at the patchset (4 patches in total)?

Thanks,
-- Dexuan


^ permalink raw reply

* RE: [PATCH] HID: hyperv: Add the support of hibernation
From: Dexuan Cui @ 2019-09-25 19:53 UTC (permalink / raw)
  To: Dexuan Cui, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, jikos@kernel.org,
	benjamin.tissoires@redhat.com, linux-hyperv@vger.kernel.org,
	linux-input@vger.kernel.org, linux-kernel@vger.kernel.org,
	Michael Kelley
In-Reply-To: <1568244952-66716-1-git-send-email-decui@microsoft.com>

> From: Dexuan Cui <decui@microsoft.com>
> Sent: Wednesday, September 11, 2019 4:36 PM
> 
> We need mousevsc_pm_notify() to make sure the pm_wakeup_hard_event()
> call
> does not prevent the system from entering hibernation: the hibernation
> is a relatively long process, which can be aborted by the call
> pm_wakeup_hard_event(), which is invoked upon mouse events.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> 
> This patch is basically a pure Hyper-V specific change and it has a
> build dependency on the commit 271b2224d42f ("Drivers: hv: vmbus:
> Implement
> suspend/resume for VSC drivers for hibernation"), which is on Sasha Levin's
> Hyper-V tree's hyperv-next branch [ ... snipped ...]
> 
> I request this patch should go through Sasha's tree rather than the
> input subsystem's tree.
> 
> Hi Jiri, Benjamin, can you please Ack?

Hi Jiri, Benjamin,
Can you please take a look at the patch?

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH] Input: hyperv-keyboard: Add the support of hibernation
From: Dexuan Cui @ 2019-09-25 19:49 UTC (permalink / raw)
  To: Dexuan Cui, dmitry.torokhov@gmail.com
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-input@vger.kernel.org, linux-kernel@vger.kernel.org,
	Michael Kelley
In-Reply-To: <PU1P153MB016914A7C827CA35D7FEB66ABF8B0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

> From: linux-hyperv-owner@vger.kernel.org
> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Dexuan Cui
> Sent: Friday, September 20, 2019 11:56 PM
> To: dmitry.torokhov@gmail.com
> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; sashal@kernel.org;
> linux-hyperv@vger.kernel.org; linux-input@vger.kernel.org;
> linux-kernel@vger.kernel.org; Michael Kelley <mikelley@microsoft.com>
> Subject: RE: [PATCH] Input: hyperv-keyboard: Add the support of hibernation
> 
> > From: dmitry.torokhov@gmail.com <dmitry.torokhov@gmail.com>
> > Sent: Thursday, September 19, 2019 9:18 AM
> >
> > Hi Dexuan,
> >
> > On Wed, Sep 11, 2019 at 11:36:20PM +0000, Dexuan Cui wrote:
> > > We need hv_kbd_pm_notify() to make sure the pm_wakeup_hard_event()
> > call
> > > does not prevent the system from entering hibernation: the hibernation
> > > is a relatively long process, which can be aborted by the call
> > > pm_wakeup_hard_event(), which is invoked upon keyboard events.
> > >
> > > diff --git a/drivers/input/serio/hyperv-keyboard.c
> > b/drivers/input/serio/hyperv-keyboard.c
> > > index 88ae7c2..277dc4c 100644
> > > --- a/drivers/input/serio/hyperv-keyboard.c
> > > +++ b/drivers/input/serio/hyperv-keyboard.c
> > > @@ -10,6 +10,7 @@
> > >  #include <linux/hyperv.h>
> > >  #include <linux/serio.h>
> > >  #include <linux/slab.h>
> > > +#include <linux/suspend.h>
> > >
> > >  /*
> > >   * Current version 1.0
> > > @@ -95,6 +96,9 @@ struct hv_kbd_dev {
> > >  	struct completion wait_event;
> > >  	spinlock_t lock; /* protects 'started' field */
> > >  	bool started;
> > > +
> > > +	struct notifier_block pm_nb;
> > > +	bool hibernation_in_progress;
> >
> > Why do you use notifier block instead of exposing proper PM methods if
> > you want to support hibernation?
> >
> > Dmitry
> 
> Hi,
> In the patch I do implement hv_kbd_suspend() and hv_kbd_resume(), and
> add them into the hv_kbd_drv struct:
> 
> @@ -416,6 +472,8 @@ static struct  hv_driver hv_kbd_drv = {
>         .id_table = id_table,
>         .probe = hv_kbd_probe,
>         .remove = hv_kbd_remove,
> +       .suspend = hv_kbd_suspend,
> +       .resume = hv_kbd_resume,
> 
> The .suspend and .resume callbacks are inroduced by another patch (which
> uses the dev_pm_ops struct):
> 271b2224d42f ("Drivers: hv: vmbus: Implement suspend/resume for VSC
> drivers for hibernation")
> (which is on the Hyper-V tree's hyperv-next branch:
> https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/commit/?h=hyperv-next&id=271b2224d42f88870e6b060924ee374871c131fc
> )
> 
> The only purpose of the notifier is to set the variable
> kbd_dev->hibernation_in_progress to true during the hibernation process.
> 
> As I explained in the changelog, the hibernation is a long process (which
> can take 10+ seconds), during which the user may unintentionally touch
> the keyboard, causing key up/down events, which are still handled by
> hv_kbd_on_receive(), which calls pm_wakeup_hard_event(), which
> calls some other functions which increase the global counter
> "pm_abort_suspend", and hence pm_wakeup_pending() becomes true.
> 
> pm_wakeup_pending() is tested in a lot of places in the suspend
> process and eventually an unintentional keystroke (or mouse movement,
> when it comes to the Hyper-V mouse driver drivers/hid/hid-hyperv.c)
> causes the whole hibernation process to be aborted. Usually this
> behavior is not expected by the user, I think.
> 
> So, I use the notifier to set the flag variable and with it the driver can
> know when it should not call pm_wakeup_hard_event().
> 
> -- Dexuan

Hi Dmitry,
Does my answer make sense? If yes, can I have an Acked-by from you?

Thanks,
-- Dexuan

^ permalink raw reply

* [PATCH v3] Drivers: hv: vmbus: Fix harmless building warnings without CONFIG_PM_SLEEP
From: Dexuan Cui @ 2019-09-25 18:43 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley, arnd@arndb.de
  Cc: Dexuan Cui

If CONFIG_PM_SLEEP is not set, we can comment out these functions to avoid
the below warnings:

drivers/hv/vmbus_drv.c:2208:12: warning: ‘vmbus_bus_resume’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:2128:12: warning: ‘vmbus_bus_suspend’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:937:12: warning: ‘vmbus_resume’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:918:12: warning: ‘vmbus_suspend’ defined but not used [-Wunused-function]

Fixes: 271b2224d42f ("Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation")
Fixes: f53335e3289f ("Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

In v2:
	test CONFIG_PM_SLEEP rather than CONFIG_PM. Thanks, Arnd!

In v3:
	Add Michael's Reviewed-by.
	No other change.

 drivers/hv/vmbus_drv.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 391f0b225c9a..53a60c81e220 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -912,6 +912,7 @@ static void vmbus_shutdown(struct device *child_device)
 		drv->shutdown(dev);
 }
 
+#ifdef CONFIG_PM_SLEEP
 /*
  * vmbus_suspend - Suspend a vmbus device
  */
@@ -949,6 +950,7 @@ static int vmbus_resume(struct device *child_device)
 
 	return drv->resume(dev);
 }
+#endif /* CONFIG_PM_SLEEP */
 
 /*
  * vmbus_device_release - Final callback release of the vmbus child device
@@ -1070,6 +1072,7 @@ void vmbus_on_msg_dpc(unsigned long data)
 	vmbus_signal_eom(msg, message_type);
 }
 
+#ifdef CONFIG_PM_SLEEP
 /*
  * Fake RESCIND_CHANNEL messages to clean up hv_sock channels by force for
  * hibernation, because hv_sock connections can not persist across hibernation.
@@ -1105,6 +1108,7 @@ static void vmbus_force_channel_rescinded(struct vmbus_channel *channel)
 		      vmbus_connection.work_queue,
 		      &ctx->work);
 }
+#endif /* CONFIG_PM_SLEEP */
 
 /*
  * Direct callback for channels using other deferred processing
@@ -2125,6 +2129,7 @@ static int vmbus_acpi_add(struct acpi_device *device)
 	return ret_val;
 }
 
+#ifdef CONFIG_PM_SLEEP
 static int vmbus_bus_suspend(struct device *dev)
 {
 	struct vmbus_channel *channel, *sc;
@@ -2247,6 +2252,7 @@ static int vmbus_bus_resume(struct device *dev)
 
 	return 0;
 }
+#endif /* CONFIG_PM_SLEEP */
 
 static const struct acpi_device_id vmbus_acpi_device_ids[] = {
 	{"VMBUS", 0},
-- 
2.19.1


^ permalink raw reply related

* Re: [Patch v4] storvsc: setup 1:1 mapping between hardware queue and CPU queue
From: Martin K. Petersen @ 2019-09-24  2:53 UTC (permalink / raw)
  To: longli
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	James E.J. Bottomley, Martin K. Petersen, linux-hyperv,
	linux-scsi, linux-kernel, Long Li
In-Reply-To: <1567790660-48142-1-git-send-email-longli@linuxonhyperv.com>


Long,

> storvsc doesn't use a dedicated hardware queue for a given CPU
> queue. When issuing I/O, it selects returning CPU (hardware queue)
> dynamically based on vmbus channel usage across all channels.

Applied to 5.4/scsi-fixes. Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* RE: [Patch v4] storvsc: setup 1:1 mapping between hardware queue and CPU queue
From: Long Li @ 2019-09-24  1:14 UTC (permalink / raw)
  To: Long Li, Sasha Levin, longli@linuxonhyperv.com
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	James E.J. Bottomley, Martin K. Petersen,
	linux-hyperv@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <CY4PR21MB0741A256EEF9D8DBAF50ED03CEBA0@CY4PR21MB0741.namprd21.prod.outlook.com>

>Subject: RE: [Patch v4] storvsc: setup 1:1 mapping between hardware queue
>and CPU queue
>
>>Subject: Re: [Patch v4] storvsc: setup 1:1 mapping between hardware
>>queue and CPU queue
>>
>>On Fri, Sep 06, 2019 at 10:24:20AM -0700, longli@linuxonhyperv.com wrote:
>>>From: Long Li <longli@microsoft.com>
>>>
>>>storvsc doesn't use a dedicated hardware queue for a given CPU queue.
>>>When issuing I/O, it selects returning CPU (hardware queue)
>>>dynamically based on vmbus channel usage across all channels.
>>>
>>>This patch advertises num_present_cpus() as number of hardware queues.
>>>This will have upper layer setup 1:1 mapping between hardware queue
>>>and CPU queue and avoid unnecessary locking when issuing I/O.
>>>
>>>Signed-off-by: Long Li <longli@microsoft.com>

Hi Martin,

I have addressed all comments on this patch. Can you merge it to SCSI?

If there is anything else I need to change, please let me know.

Thanks

Long


>>>---
>>>
>>>Changes:
>>>v2: rely on default upper layer function to map queues. (suggested by
>>>Ming Lei
>>><tom.leiming@gmail.com>)
>>>v3: use num_present_cpus() instead of num_online_cpus(). Hyper-v
>>>doesn't support hot-add CPUs. (suggested by Michael Kelley
>>><mikelley@microsoft.com>)
>>>v4: move change logs to after Signed-of-by
>>>
>>> drivers/scsi/storvsc_drv.c | 3 +--
>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>>diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
>>>index b89269120a2d..cf987712041a 100644
>>>--- a/drivers/scsi/storvsc_drv.c
>>>+++ b/drivers/scsi/storvsc_drv.c
>>>@@ -1836,8 +1836,7 @@ static int storvsc_probe(struct hv_device *device,
>>> 	/*
>>> 	 * Set the number of HW queues we are supporting.
>>> 	 */
>>>-	if (stor_device->num_sc != 0)
>>>-		host->nr_hw_queues = stor_device->num_sc + 1;
>>>+	host->nr_hw_queues = num_present_cpus();
>>
>>Just looking at the change notes for v3: why isn't this
>>num_active_cpus() then? One can still isolate CPUs on hyper-v, no?
>
>The isolated CPU can be made online at run time. For example, even
>maxcpus=x is put on the boot line, individual CPUs can still be made
>online/offline.
>
>>
>>--
>>Thanks,
>>Sasha

^ permalink raw reply

* [GIT PULL] Hyper-V commits for 5.4
From: Sasha Levin @ 2019-09-24  1:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-hyperv, kys, sthemmin, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

The following changes since commit 5f9e832c137075045d15cd6899ab0505cfb2ca4b:

  Linus 5.3-rc1 (2019-07-21 14:05:38 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git tags/hyperv-next-signed

for you to fetch changes up to d8bd2d442bb2688b428ac7164e5dc6d95d4fa65b:

  Drivers: hv: vmbus: Resume after fixing up old primary channels (2019-09-06 14:52:44 -0400)

- ----------------------------------------------------------------
- - First round of vmbus hibernation support from Dexuan Cui.
- - Removal of dependencies on PAGE_SIZE by Maya Nakamura.
- - Moving the hyper-v tools/ code into the tools build system by Andy
Shevchenko.
- - hyper-v balloon cleanups by Dexuan Cui.

- ----------------------------------------------------------------
Andy Shevchenko (1):
      Tools: hv: move to tools buildsystem

Dexuan Cui (11):
      hv_balloon: Use a static page for the balloon_up send buffer
      hv_balloon: Reorganize the probe function
      Drivers: hv: vmbus: Break out synic enable and disable operations
      Drivers: hv: vmbus: Suspend/resume the synic for hibernation
      Drivers: hv: vmbus: Add a helper function is_sub_channel()
      Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation
      Drivers: hv: vmbus: Ignore the offers when resuming from hibernation
      Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation
      Drivers: hv: vmbus: Clean up hv_sock channels by force upon suspend
      Drivers: hv: vmbus: Suspend after cleaning up hv_sock and sub channels
      Drivers: hv: vmbus: Resume after fixing up old primary channels

Maya Nakamura (1):
      HID: hv: Remove dependencies on PAGE_SIZE for ring buffer

 drivers/hid/hid-hyperv.c  |   4 +-
 drivers/hv/channel_mgmt.c | 161 +++++++++++++++++++++++++---
 drivers/hv/connection.c   |   8 +-
 drivers/hv/hv.c           |  66 +++++++-----
 drivers/hv/hv_balloon.c   | 143 ++++++++++++-------------
 drivers/hv/hyperv_vmbus.h |  30 ++++++
 drivers/hv/vmbus_drv.c    | 265 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/hyperv.h    |  16 ++-
 tools/hv/Build            |   3 +
 tools/hv/Makefile         |  51 +++++++--
 10 files changed, 613 insertions(+), 134 deletions(-)
 create mode 100644 tools/hv/Build
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAl2Jaw0ACgkQ3qZv95d3
LNx6xg//WxLzQxi8rnxO2vYFWL9/Hq0ZZfUTHVE3FuRYJdI/IZlFhR0Kw/5MOcT3
PmAFuLNbGFDbDTP21oI+eXVHbrp8mWbSfNRR9EN/BGCErB6WAVTJlrm8ImEe8aBj
yDU4PjMp7gHwSCcuyYh813Uj9jlPybISyroCBnJ2q2M7iSLSb/wlUR+3KB+p99eS
ScyAl0KS7f/hUI1MRUEJ/0i+buEGwG9uBGsEFcicVLqNwD1Mly1ulmyXY6oQm+QZ
P7/PO7k3h/nlCr5QI9rJVsZqy0pZcRp5RI9JIriPHyCQNxy2V7bpvl979XtrODv6
3F2UNBw4I73XUZ4GcQlGeM7kVAc6CuuyiKnbRqa58e4Bjnhzj+VDbk5Brf4LFyYl
ZG4rDuBTBU5cfKi2fjmasYjgfGtQdWdhTeJM+aqsT7ScyevH/8N7k1Vte9K1s5O8
xx1MszBQ27/KxMqKD8TiL2GKQU036FiQdEnnS8We1XD1ryCR6EA+WgWRQennsZa0
jUbGJOTI3SPcjxM8oHzuiJSIK6gKRq+zKY51EVeTNwEOCPnp+WwqVO9I8inVSN0m
WwJd84pxGXuqjjIe8FvmtUE3B0AF7bW8bPV+aNeDEtPOPMPzSW9HHKWS+rllKnsA
3Z4ktdCl1Al8dsKU9mJPuzAGn0Sxg3zKQ9VPTvZUQt2WYRWjKi4=
=y2xA
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: [PATCH 2/3] KVM: x86: hyper-v: set NoNonArchitecturalCoreSharing CPUID bit when SMT is impossible
From: Paolo Bonzini @ 2019-09-23 16:48 UTC (permalink / raw)
  To: Peter Zijlstra, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, linux-hyperv, x86, Radim Krčmář,
	Sean Christopherson, Jim Mattson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Michael Kelley, Roman Kagan
In-Reply-To: <20190923153713.GF2369@hirez.programming.kicks-ass.net>

On 23/09/19 17:37, Peter Zijlstra wrote:
>> This patch reports NoNonArchitecturalCoreSharing bit in to userspace in the
>> first case. The second case is outside of KVM's domain of responsibility
>> (as vCPU pinning is actually done by someone who manages KVM's userspace -
>> e.g. libvirt pinning QEMU threads).
> This is purely about guest<->guest MDS, right? Ie. not worse than actual
> hardware.

Even within the same guest.  If vCPU 1 is on virtual core 1 and vCPU 2
is on virtual core 2, but they can share the same physical core, core
scheduling in the guest can do nothing about it.

Paolo

^ permalink raw reply

* Re: [PATCH 2/3] KVM: x86: hyper-v: set NoNonArchitecturalCoreSharing CPUID bit when SMT is impossible
From: Peter Zijlstra @ 2019-09-23 15:37 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, linux-kernel, linux-hyperv, x86, Paolo Bonzini,
	Radim Krčmář, Sean Christopherson, Jim Mattson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Michael Kelley, Roman Kagan
In-Reply-To: <20190916162258.6528-3-vkuznets@redhat.com>

On Mon, Sep 16, 2019 at 06:22:57PM +0200, Vitaly Kuznetsov wrote:
> Hyper-V 2019 doesn't expose MD_CLEAR CPUID bit to guests when it cannot
> guarantee that two virtual processors won't end up running on sibling SMT
> threads without knowing about it. This is done as an optimization as in
> this case there is nothing the guest can do to protect itself against MDS
> and issuing additional flush requests is just pointless. On bare metal the
> topology is known, however, when Hyper-V is running nested (e.g. on top of
> KVM) it needs an additional piece of information: a confirmation that the
> exposed topology (wrt vCPU placement on different SMT threads) is
> trustworthy.
> 
> NoNonArchitecturalCoreSharing (CPUID 0x40000004 EAX bit 18) is described in
> TLFS as follows: "Indicates that a virtual processor will never share a
> physical core with another virtual processor, except for virtual processors
> that are reported as sibling SMT threads." From KVM we can give such
> guarantee in two cases:
> - SMT is unsupported or forcefully disabled (just 'disabled' doesn't work
>  as it can become re-enabled during the lifetime of the guest).
> - vCPUs are properly pinned so the scheduler won't put them on sibling
> SMT threads (when they're not reported as such).
> 
> This patch reports NoNonArchitecturalCoreSharing bit in to userspace in the
> first case. The second case is outside of KVM's domain of responsibility
> (as vCPU pinning is actually done by someone who manages KVM's userspace -
> e.g. libvirt pinning QEMU threads).

This is purely about guest<->guest MDS, right? Ie. not worse than actual
hardware.

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: David Hildenbrand @ 2019-09-23 12:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <20190923123008.GP6016@dhcp22.suse.cz>

On 23.09.19 14:30, Michal Hocko wrote:
> On Mon 23-09-19 14:20:05, David Hildenbrand wrote:
>> On 23.09.19 14:07, Michal Hocko wrote:
>>> On Mon 23-09-19 13:34:18, David Hildenbrand wrote:
>>>> On 23.09.19 13:15, Michal Hocko wrote:
>>> [...]
>>>>> I am wondering why those pages get onlined when they are, in fact,
>>>>> supposed to be offline.
>>>>>
>>>>
>>>> It's the current way of emulating sub-memory-block hotplug on top of the
>>>> memory bock device API we have. Hyper-V and XEN have been using that for
>>>> a long time.
>>>
>>> Do they really have to use the existing block interface when they in
>>> fact do not operate on the block granularity? Zone device memory already
>>> acts on sub section/block boundaries.
>>>
>>
>> Yes, we need memory blocks, especially for user space to properly online
>> them (as we discussed a while back, to decide on a zone) and for udev
>> events, to e.g., properly reload kexec when memory blocks get
>> added/removed/onlined/offlined.
> 
> Just to make sure I really follow. We need a user interface to control
> where the memory gets onlined but it is the driver which determines
> which part of the block really gets onlined, right?
> 

Yes, it's the drivers job to decide which part of the memory block can
actually get used, and when.

It's a little bit like the driver immediately allocates unbacked memory
again, to free that memory to the buddy once requested. (similar to how
ballooning works).

-- 

Thanks,

David / dhildenb

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: Michal Hocko @ 2019-09-23 12:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <b60b783e-a251-f155-3cef-e0fa4a18abd0@redhat.com>

On Mon 23-09-19 14:20:05, David Hildenbrand wrote:
> On 23.09.19 14:07, Michal Hocko wrote:
> > On Mon 23-09-19 13:34:18, David Hildenbrand wrote:
> >> On 23.09.19 13:15, Michal Hocko wrote:
> > [...]
> >>> I am wondering why those pages get onlined when they are, in fact,
> >>> supposed to be offline.
> >>>
> >>
> >> It's the current way of emulating sub-memory-block hotplug on top of the
> >> memory bock device API we have. Hyper-V and XEN have been using that for
> >> a long time.
> > 
> > Do they really have to use the existing block interface when they in
> > fact do not operate on the block granularity? Zone device memory already
> > acts on sub section/block boundaries.
> > 
> 
> Yes, we need memory blocks, especially for user space to properly online
> them (as we discussed a while back, to decide on a zone) and for udev
> events, to e.g., properly reload kexec when memory blocks get
> added/removed/onlined/offlined.

Just to make sure I really follow. We need a user interface to control
where the memory gets onlined but it is the driver which determines
which part of the block really gets onlined, right?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: David Hildenbrand @ 2019-09-23 12:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <20190923120719.GM6016@dhcp22.suse.cz>

On 23.09.19 14:07, Michal Hocko wrote:
> On Mon 23-09-19 13:34:18, David Hildenbrand wrote:
>> On 23.09.19 13:15, Michal Hocko wrote:
> [...]
>>> I am wondering why those pages get onlined when they are, in fact,
>>> supposed to be offline.
>>>
>>
>> It's the current way of emulating sub-memory-block hotplug on top of the
>> memory bock device API we have. Hyper-V and XEN have been using that for
>> a long time.
> 
> Do they really have to use the existing block interface when they in
> fact do not operate on the block granularity? Zone device memory already
> acts on sub section/block boundaries.
> 

Yes, we need memory blocks, especially for user space to properly online
them (as we discussed a while back, to decide on a zone) and for udev
events, to e.g., properly reload kexec when memory blocks get
added/removed/onlined/offlined.

ZONE_DEVICE is special as it doesn't have to worry about
onlining/offlining/dumping. Also, it doesn't care about
SECTION_IS_ONLINE, but figuring out how to detect which sub
sections/blocks have a valid memmap is still something to get fixed and
is still getting discussed on MM.

-- 

Thanks,

David / dhildenb

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: Michal Hocko @ 2019-09-23 12:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <88ac3511-4ad8-d5c8-8e6a-0cca0a0f0989@redhat.com>

On Mon 23-09-19 13:34:18, David Hildenbrand wrote:
> On 23.09.19 13:15, Michal Hocko wrote:
[...]
> > I am wondering why those pages get onlined when they are, in fact,
> > supposed to be offline.
> > 
> 
> It's the current way of emulating sub-memory-block hotplug on top of the
> memory bock device API we have. Hyper-V and XEN have been using that for
> a long time.

Do they really have to use the existing block interface when they in
fact do not operate on the block granularity? Zone device memory already
acts on sub section/block boundaries.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: David Hildenbrand @ 2019-09-23 11:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <88ac3511-4ad8-d5c8-8e6a-0cca0a0f0989@redhat.com>

On 23.09.19 13:34, David Hildenbrand wrote:
> On 23.09.19 13:15, Michal Hocko wrote:
>> On Mon 23-09-19 11:31:30, David Hildenbrand wrote:
>>> On 23.09.19 10:58, Michal Hocko wrote:
>>>> On Fri 20-09-19 10:17:54, David Hildenbrand wrote:
>>>>> On 09.09.19 13:48, David Hildenbrand wrote:
>>>>>> Based on linux/next + "[PATCH 0/3] Remove __online_page_set_limits()"
>>>>>>
>>>>>> Let's replace the __online_page...() functions by generic_online_page().
>>>>>> Hyper-V only wants to delay the actual onlining of un-backed pages, so we
>>>>>> can simpy re-use the generic function.
>>>>>>
>>>>>> Only compile-tested.
>>>>>>
>>>>>> Cc: Souptick Joarder <jrdr.linux@gmail.com>
>>>>>>
>>>>>> David Hildenbrand (3):
>>>>>>   mm/memory_hotplug: Export generic_online_page()
>>>>>>   hv_balloon: Use generic_online_page()
>>>>>>   mm/memory_hotplug: Remove __online_page_free() and
>>>>>>     __online_page_increment_counters()
>>>>>>
>>>>>>  drivers/hv/hv_balloon.c        |  3 +--
>>>>>>  include/linux/memory_hotplug.h |  4 +---
>>>>>>  mm/memory_hotplug.c            | 17 ++---------------
>>>>>>  3 files changed, 4 insertions(+), 20 deletions(-)
>>>>>>
>>>>>
>>>>> Ping, any comments on this one?
>>>>
>>>> Unification makes a lot of sense to me. You can add
>>>> Acked-by: Michal Hocko <mhocko@suse.com>
>>>>
>>>> I will most likely won't surprise if I asked for more here though ;)
>>>
>>> I'm not surprised, but definitely not in a negative sense ;) I was
>>> asking myself if we could somehow rework this, too.
>>>
>>>> I have to confess I really detest the whole concept of a hidden callback
>>>> with a very weird API. Is this something we can do about? I do realize
>>>> that adding a callback would require either cluttering the existing APIs
>>>> but maybe we can come up with something more clever. Or maybe existing
>>>> external users of online callback can do that as a separate step after
>>>> the online is completed - or is this impossible due to locking
>>>> guarantees?
>>>>
>>>
>>> The use case of this (somewhat special) callback really is to avoid
>>> selected (unbacked in the hypervisor) pages to get put to the buddy just
>>> now, but instead to defer that (sometimes, defer till infinity ;) ).
>>> Especially, to hinder these pages from getting touched at all. Pages
>>> that won't be put to the buddy will usually get PG_offline set (e.g.,
>>> Hyper-V and XEN) - the only two users I am aware of.
>>>
>>> For Hyper-V (and also eventually virtio-mem), it is important to set
>>> PG_offline before marking the section to be online (SECTION_IS_ONLINE).
>>> Only this way, PG_offline is properly set on all pfn_to_online_page()
>>> pages, meaning "don't touch this page" - e.g., used to skip over such
>>> pages when suspending or by makedumpfile to skip over such offline pages
>>> when creating a memory dump.
>>
>> Thanks for the clarification. I have never really studied what those
>> callbacks are doing really.
>>
>>> So if we would e.g., try to piggy-back onto the memory_notify()
>>> infrastructure, we could
>>> 1. Online all pages to the buddy (dropping the callback)
>>> 2. E.g., memory_notify(MEM_ONLINE_PAGES, &arg);
>>> -> in the notifier, pull pages from the buddy, mark sections online
>>> 3. Set all involved sections online (online_mem_sections())
>>
>> This doesn't really sound any better. For one pages are immediately
>> usable when they hit the buddy allocator so this is racy and thus not
>> reliable.
>>
>>> However, I am not sure what actually happens after 1. - we are only
>>> holding the device hotplug lock and the memory hotplug lock, so the
>>> pages can just get allocated. Also, it sounds like more work and code
>>> for the same end result (okay, if the rework is really necessary, though).
>>>
>>> So yeah, while the current callback might not be optimal, I don't see an
>>> easy and clean way to rework this. With the change in this series we are
>>> at least able to simply defer doing what would have been done without
>>> the callback - not perfect but better.
>>>
>>> Do you have anything in mind that could work out and make this nicer?
>>
>> I am wondering why those pages get onlined when they are, in fact,
>> supposed to be offline.
>>
> 
> It's the current way of emulating sub-memory-block hotplug on top of the
> memory bock device API we have. Hyper-V and XEN have been using that for
> a long time.
> 

So one idea would be to let clients set pages to PG_offline during
MEM_GOING_ONLINE. We could then skip any PG_offline pages when onlining
pages, not onlining them to the buddy.

But then, there still has to be a way to online pages when required -
e.g., generic_online_page(). At least the callback could go.

-- 

Thanks,

David / dhildenb

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: David Hildenbrand @ 2019-09-23 11:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <20190923111559.GK6016@dhcp22.suse.cz>

On 23.09.19 13:15, Michal Hocko wrote:
> On Mon 23-09-19 11:31:30, David Hildenbrand wrote:
>> On 23.09.19 10:58, Michal Hocko wrote:
>>> On Fri 20-09-19 10:17:54, David Hildenbrand wrote:
>>>> On 09.09.19 13:48, David Hildenbrand wrote:
>>>>> Based on linux/next + "[PATCH 0/3] Remove __online_page_set_limits()"
>>>>>
>>>>> Let's replace the __online_page...() functions by generic_online_page().
>>>>> Hyper-V only wants to delay the actual onlining of un-backed pages, so we
>>>>> can simpy re-use the generic function.
>>>>>
>>>>> Only compile-tested.
>>>>>
>>>>> Cc: Souptick Joarder <jrdr.linux@gmail.com>
>>>>>
>>>>> David Hildenbrand (3):
>>>>>   mm/memory_hotplug: Export generic_online_page()
>>>>>   hv_balloon: Use generic_online_page()
>>>>>   mm/memory_hotplug: Remove __online_page_free() and
>>>>>     __online_page_increment_counters()
>>>>>
>>>>>  drivers/hv/hv_balloon.c        |  3 +--
>>>>>  include/linux/memory_hotplug.h |  4 +---
>>>>>  mm/memory_hotplug.c            | 17 ++---------------
>>>>>  3 files changed, 4 insertions(+), 20 deletions(-)
>>>>>
>>>>
>>>> Ping, any comments on this one?
>>>
>>> Unification makes a lot of sense to me. You can add
>>> Acked-by: Michal Hocko <mhocko@suse.com>
>>>
>>> I will most likely won't surprise if I asked for more here though ;)
>>
>> I'm not surprised, but definitely not in a negative sense ;) I was
>> asking myself if we could somehow rework this, too.
>>
>>> I have to confess I really detest the whole concept of a hidden callback
>>> with a very weird API. Is this something we can do about? I do realize
>>> that adding a callback would require either cluttering the existing APIs
>>> but maybe we can come up with something more clever. Or maybe existing
>>> external users of online callback can do that as a separate step after
>>> the online is completed - or is this impossible due to locking
>>> guarantees?
>>>
>>
>> The use case of this (somewhat special) callback really is to avoid
>> selected (unbacked in the hypervisor) pages to get put to the buddy just
>> now, but instead to defer that (sometimes, defer till infinity ;) ).
>> Especially, to hinder these pages from getting touched at all. Pages
>> that won't be put to the buddy will usually get PG_offline set (e.g.,
>> Hyper-V and XEN) - the only two users I am aware of.
>>
>> For Hyper-V (and also eventually virtio-mem), it is important to set
>> PG_offline before marking the section to be online (SECTION_IS_ONLINE).
>> Only this way, PG_offline is properly set on all pfn_to_online_page()
>> pages, meaning "don't touch this page" - e.g., used to skip over such
>> pages when suspending or by makedumpfile to skip over such offline pages
>> when creating a memory dump.
> 
> Thanks for the clarification. I have never really studied what those
> callbacks are doing really.
> 
>> So if we would e.g., try to piggy-back onto the memory_notify()
>> infrastructure, we could
>> 1. Online all pages to the buddy (dropping the callback)
>> 2. E.g., memory_notify(MEM_ONLINE_PAGES, &arg);
>> -> in the notifier, pull pages from the buddy, mark sections online
>> 3. Set all involved sections online (online_mem_sections())
> 
> This doesn't really sound any better. For one pages are immediately
> usable when they hit the buddy allocator so this is racy and thus not
> reliable.
> 
>> However, I am not sure what actually happens after 1. - we are only
>> holding the device hotplug lock and the memory hotplug lock, so the
>> pages can just get allocated. Also, it sounds like more work and code
>> for the same end result (okay, if the rework is really necessary, though).
>>
>> So yeah, while the current callback might not be optimal, I don't see an
>> easy and clean way to rework this. With the change in this series we are
>> at least able to simply defer doing what would have been done without
>> the callback - not perfect but better.
>>
>> Do you have anything in mind that could work out and make this nicer?
> 
> I am wondering why those pages get onlined when they are, in fact,
> supposed to be offline.
> 

It's the current way of emulating sub-memory-block hotplug on top of the
memory bock device API we have. Hyper-V and XEN have been using that for
a long time.

-- 

Thanks,

David / dhildenb

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: Michal Hocko @ 2019-09-23 11:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <df15f269-48df-8738-c714-9fae3cb3b44c@redhat.com>

On Mon 23-09-19 11:31:30, David Hildenbrand wrote:
> On 23.09.19 10:58, Michal Hocko wrote:
> > On Fri 20-09-19 10:17:54, David Hildenbrand wrote:
> >> On 09.09.19 13:48, David Hildenbrand wrote:
> >>> Based on linux/next + "[PATCH 0/3] Remove __online_page_set_limits()"
> >>>
> >>> Let's replace the __online_page...() functions by generic_online_page().
> >>> Hyper-V only wants to delay the actual onlining of un-backed pages, so we
> >>> can simpy re-use the generic function.
> >>>
> >>> Only compile-tested.
> >>>
> >>> Cc: Souptick Joarder <jrdr.linux@gmail.com>
> >>>
> >>> David Hildenbrand (3):
> >>>   mm/memory_hotplug: Export generic_online_page()
> >>>   hv_balloon: Use generic_online_page()
> >>>   mm/memory_hotplug: Remove __online_page_free() and
> >>>     __online_page_increment_counters()
> >>>
> >>>  drivers/hv/hv_balloon.c        |  3 +--
> >>>  include/linux/memory_hotplug.h |  4 +---
> >>>  mm/memory_hotplug.c            | 17 ++---------------
> >>>  3 files changed, 4 insertions(+), 20 deletions(-)
> >>>
> >>
> >> Ping, any comments on this one?
> > 
> > Unification makes a lot of sense to me. You can add
> > Acked-by: Michal Hocko <mhocko@suse.com>
> > 
> > I will most likely won't surprise if I asked for more here though ;)
> 
> I'm not surprised, but definitely not in a negative sense ;) I was
> asking myself if we could somehow rework this, too.
> 
> > I have to confess I really detest the whole concept of a hidden callback
> > with a very weird API. Is this something we can do about? I do realize
> > that adding a callback would require either cluttering the existing APIs
> > but maybe we can come up with something more clever. Or maybe existing
> > external users of online callback can do that as a separate step after
> > the online is completed - or is this impossible due to locking
> > guarantees?
> > 
> 
> The use case of this (somewhat special) callback really is to avoid
> selected (unbacked in the hypervisor) pages to get put to the buddy just
> now, but instead to defer that (sometimes, defer till infinity ;) ).
> Especially, to hinder these pages from getting touched at all. Pages
> that won't be put to the buddy will usually get PG_offline set (e.g.,
> Hyper-V and XEN) - the only two users I am aware of.
> 
> For Hyper-V (and also eventually virtio-mem), it is important to set
> PG_offline before marking the section to be online (SECTION_IS_ONLINE).
> Only this way, PG_offline is properly set on all pfn_to_online_page()
> pages, meaning "don't touch this page" - e.g., used to skip over such
> pages when suspending or by makedumpfile to skip over such offline pages
> when creating a memory dump.

Thanks for the clarification. I have never really studied what those
callbacks are doing really.

> So if we would e.g., try to piggy-back onto the memory_notify()
> infrastructure, we could
> 1. Online all pages to the buddy (dropping the callback)
> 2. E.g., memory_notify(MEM_ONLINE_PAGES, &arg);
> -> in the notifier, pull pages from the buddy, mark sections online
> 3. Set all involved sections online (online_mem_sections())

This doesn't really sound any better. For one pages are immediately
usable when they hit the buddy allocator so this is racy and thus not
reliable.

> However, I am not sure what actually happens after 1. - we are only
> holding the device hotplug lock and the memory hotplug lock, so the
> pages can just get allocated. Also, it sounds like more work and code
> for the same end result (okay, if the rework is really necessary, though).
> 
> So yeah, while the current callback might not be optimal, I don't see an
> easy and clean way to rework this. With the change in this series we are
> at least able to simply defer doing what would have been done without
> the callback - not perfect but better.
> 
> Do you have anything in mind that could work out and make this nicer?

I am wondering why those pages get onlined when they are, in fact,
supposed to be offline.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: David Hildenbrand @ 2019-09-23  9:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <20190923085807.GD6016@dhcp22.suse.cz>

On 23.09.19 10:58, Michal Hocko wrote:
> On Fri 20-09-19 10:17:54, David Hildenbrand wrote:
>> On 09.09.19 13:48, David Hildenbrand wrote:
>>> Based on linux/next + "[PATCH 0/3] Remove __online_page_set_limits()"
>>>
>>> Let's replace the __online_page...() functions by generic_online_page().
>>> Hyper-V only wants to delay the actual onlining of un-backed pages, so we
>>> can simpy re-use the generic function.
>>>
>>> Only compile-tested.
>>>
>>> Cc: Souptick Joarder <jrdr.linux@gmail.com>
>>>
>>> David Hildenbrand (3):
>>>   mm/memory_hotplug: Export generic_online_page()
>>>   hv_balloon: Use generic_online_page()
>>>   mm/memory_hotplug: Remove __online_page_free() and
>>>     __online_page_increment_counters()
>>>
>>>  drivers/hv/hv_balloon.c        |  3 +--
>>>  include/linux/memory_hotplug.h |  4 +---
>>>  mm/memory_hotplug.c            | 17 ++---------------
>>>  3 files changed, 4 insertions(+), 20 deletions(-)
>>>
>>
>> Ping, any comments on this one?
> 
> Unification makes a lot of sense to me. You can add
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
> I will most likely won't surprise if I asked for more here though ;)

I'm not surprised, but definitely not in a negative sense ;) I was
asking myself if we could somehow rework this, too.

> I have to confess I really detest the whole concept of a hidden callback
> with a very weird API. Is this something we can do about? I do realize
> that adding a callback would require either cluttering the existing APIs
> but maybe we can come up with something more clever. Or maybe existing
> external users of online callback can do that as a separate step after
> the online is completed - or is this impossible due to locking
> guarantees?
> 

The use case of this (somewhat special) callback really is to avoid
selected (unbacked in the hypervisor) pages to get put to the buddy just
now, but instead to defer that (sometimes, defer till infinity ;) ).
Especially, to hinder these pages from getting touched at all. Pages
that won't be put to the buddy will usually get PG_offline set (e.g.,
Hyper-V and XEN) - the only two users I am aware of.

For Hyper-V (and also eventually virtio-mem), it is important to set
PG_offline before marking the section to be online (SECTION_IS_ONLINE).
Only this way, PG_offline is properly set on all pfn_to_online_page()
pages, meaning "don't touch this page" - e.g., used to skip over such
pages when suspending or by makedumpfile to skip over such offline pages
when creating a memory dump.

So if we would e.g., try to piggy-back onto the memory_notify()
infrastructure, we could
1. Online all pages to the buddy (dropping the callback)
2. E.g., memory_notify(MEM_ONLINE_PAGES, &arg);
-> in the notifier, pull pages from the buddy, mark sections online
3. Set all involved sections online (online_mem_sections())

However, I am not sure what actually happens after 1. - we are only
holding the device hotplug lock and the memory hotplug lock, so the
pages can just get allocated. Also, it sounds like more work and code
for the same end result (okay, if the rework is really necessary, though).

So yeah, while the current callback might not be optimal, I don't see an
easy and clean way to rework this. With the change in this series we are
at least able to simply defer doing what would have been done without
the callback - not perfect but better.

Do you have anything in mind that could work out and make this nicer?

-- 

Thanks,

David / dhildenb

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: Michal Hocko @ 2019-09-23  8:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <f73c4d0f-ad81-81a6-1107-852f2b9cad41@redhat.com>

On Fri 20-09-19 10:17:54, David Hildenbrand wrote:
> On 09.09.19 13:48, David Hildenbrand wrote:
> > Based on linux/next + "[PATCH 0/3] Remove __online_page_set_limits()"
> > 
> > Let's replace the __online_page...() functions by generic_online_page().
> > Hyper-V only wants to delay the actual onlining of un-backed pages, so we
> > can simpy re-use the generic function.
> > 
> > Only compile-tested.
> > 
> > Cc: Souptick Joarder <jrdr.linux@gmail.com>
> > 
> > David Hildenbrand (3):
> >   mm/memory_hotplug: Export generic_online_page()
> >   hv_balloon: Use generic_online_page()
> >   mm/memory_hotplug: Remove __online_page_free() and
> >     __online_page_increment_counters()
> > 
> >  drivers/hv/hv_balloon.c        |  3 +--
> >  include/linux/memory_hotplug.h |  4 +---
> >  mm/memory_hotplug.c            | 17 ++---------------
> >  3 files changed, 4 insertions(+), 20 deletions(-)
> > 
> 
> Ping, any comments on this one?

Unification makes a lot of sense to me. You can add
Acked-by: Michal Hocko <mhocko@suse.com>

I will most likely won't surprise if I asked for more here though ;)
I have to confess I really detest the whole concept of a hidden callback
with a very weird API. Is this something we can do about? I do realize
that adding a callback would require either cluttering the existing APIs
but maybe we can come up with something more clever. Or maybe existing
external users of online callback can do that as a separate step after
the online is completed - or is this impossible due to locking
guarantees?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* RE: [PATCH 1/3] hv_utils: Add the support of hibernation
From: Dexuan Cui @ 2019-09-21  7:26 UTC (permalink / raw)
  To: vkuznets
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <87ftksa8dg.fsf@vitty.brq.redhat.com>

> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> Sent: Thursday, September 19, 2019 3:28 AM
> 
> Dexuan Cui <decui@microsoft.com> writes:
> 
> > BTW, for vss, maybe the VM should not hibernate if there is a backup
> > ongoing? -- if the file system is frozen by hv_vss_daemon, and the VM
> > hibernates, then when the VM resumes back, it's almost always true that
> > the VM won't receive the host's VSS_OP_THAW request, and the VM will
> > end up in an unusable state.
> 
> Makes sense. Or, alternatively, can we postpone hibernation until after
> VSS_OP_THAW?
> 
> Vitaly

It looks we should not postpone that, because:
1. When we're in util_suspend(), all the user space processes have been
frozen, so even if the VM receives the VSS_OP_THAW message form the host,
there is no chance for the hv_vss_daemon to handle it. 

2. Between the window the host sends the VSS_OP_FREEZE message and the
VSS_OP_THAW mesasge, util_suspend() may jump in and close the channel,
and then the host will not send a VSS_OP_THAW.

3. The host doesn't guarantee how soon it sends the VSS_OP_THAW message,
though in practice IIRC the host *usually* sends the message soon. The
hibernation process has a watchdog of 120s set by dpm_watchdog_set(): if
dpm_suspend() (which calls util_probe()) can not finish in 120 seconds, the
hibernation will be aborted.

3 may not look like a strong reason, but generally speaking I'd like to avoid
an indeterminate dependency.

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH] Input: hyperv-keyboard: Add the support of hibernation
From: Dexuan Cui @ 2019-09-21  6:56 UTC (permalink / raw)
  To: dmitry.torokhov@gmail.com
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-input@vger.kernel.org, linux-kernel@vger.kernel.org,
	Michael Kelley
In-Reply-To: <20190919161752.GS237523@dtor-ws>

> From: dmitry.torokhov@gmail.com <dmitry.torokhov@gmail.com>
> Sent: Thursday, September 19, 2019 9:18 AM
> 
> Hi Dexuan,
> 
> On Wed, Sep 11, 2019 at 11:36:20PM +0000, Dexuan Cui wrote:
> > We need hv_kbd_pm_notify() to make sure the pm_wakeup_hard_event()
> call
> > does not prevent the system from entering hibernation: the hibernation
> > is a relatively long process, which can be aborted by the call
> > pm_wakeup_hard_event(), which is invoked upon keyboard events.
> >
> > diff --git a/drivers/input/serio/hyperv-keyboard.c
> b/drivers/input/serio/hyperv-keyboard.c
> > index 88ae7c2..277dc4c 100644
> > --- a/drivers/input/serio/hyperv-keyboard.c
> > +++ b/drivers/input/serio/hyperv-keyboard.c
> > @@ -10,6 +10,7 @@
> >  #include <linux/hyperv.h>
> >  #include <linux/serio.h>
> >  #include <linux/slab.h>
> > +#include <linux/suspend.h>
> >
> >  /*
> >   * Current version 1.0
> > @@ -95,6 +96,9 @@ struct hv_kbd_dev {
> >  	struct completion wait_event;
> >  	spinlock_t lock; /* protects 'started' field */
> >  	bool started;
> > +
> > +	struct notifier_block pm_nb;
> > +	bool hibernation_in_progress;
> 
> Why do you use notifier block instead of exposing proper PM methods if
> you want to support hibernation?
> 
> Dmitry

Hi,
In the patch I do implement hv_kbd_suspend() and hv_kbd_resume(), and
add them into the hv_kbd_drv struct:

@@ -416,6 +472,8 @@ static struct  hv_driver hv_kbd_drv = {
        .id_table = id_table,
        .probe = hv_kbd_probe,
        .remove = hv_kbd_remove,
+       .suspend = hv_kbd_suspend,
+       .resume = hv_kbd_resume,

The .suspend and .resume callbacks are inroduced by another patch (which
uses the dev_pm_ops struct):
271b2224d42f ("Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation")
(which is on the Hyper-V tree's hyperv-next branch:
https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/commit/?h=hyperv-next&id=271b2224d42f88870e6b060924ee374871c131fc )

The only purpose of the notifier is to set the variable 
kbd_dev->hibernation_in_progress to true during the hibernation process.

As I explained in the changelog, the hibernation is a long process (which
can take 10+ seconds), during which the user may unintentionally touch
the keyboard, causing key up/down events, which are still handled by
hv_kbd_on_receive(), which calls pm_wakeup_hard_event(), which
calls some other functions which increase the global counter
"pm_abort_suspend", and hence pm_wakeup_pending() becomes true.

pm_wakeup_pending() is tested in a lot of places in the suspend
process and eventually an unintentional keystroke (or mouse movement,
when it comes to the Hyper-V mouse driver drivers/hid/hid-hyperv.c)
causes the whole hibernation process to be aborted. Usually this
behavior is not expected by the user, I think.

So, I use the notifier to set the flag variable and with it the driver can
know when it should not call pm_wakeup_hard_event().

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH] Drivers: hv: vmbus: Fix harmless building warnings without CONFIG_PM
From: Dexuan Cui @ 2019-09-20 21:02 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <CAK8P3a0EvdR0SZdPhRV2o3PrxHo4BpJdWzAjExmKHhwrOsL54Q@mail.gmail.com>

> From: Arnd Bergmann <arnd@arndb.de>
> Sent: Friday, September 20, 2019 12:33 AM
> On Thu, Sep 19, 2019 at 11:38 PM Dexuan Cui <decui@microsoft.com> wrote:
> > > Sent: Thursday, September 19, 2019 5:11 AM
> > > On Thu, Sep 19, 2019 at 7:19 AM Dexuan Cui <decui@microsoft.com>
> wrote:
> 
> > > I think this will still produce a warning if CONFIG_PM is set but
> > > CONFIG_PM_SLEEP is not, possibly in other configurations as
> > > well.
> >
> > You're correct. Thanks!
> >
> > I'll use " #ifdef CONFIG_PM_SLEEP ... #endif" instead.
> >
> > The mentioned functions are only used in the micros
> > SET_NOIRQ_SYSTEM_SLEEP_PM_OPS, which is empty if CONFIG_PM_SLEEP
> > is not defined. So it looks to me using "#ifdef CONFIG_PM_SLEEP ..." should
> > resolve the issue.
> 
> Probably, yes. There are sometimes surprising effects, such as when one of the
> functions inside of the #ifdef call another function that is otherwise unused.

I reviewed the related functions again and I believe with the v2 we don't have
such an issue as you described here.
 
> I would normally try to build a few hundred randconfig builds to be fairly sure
> of a change like this, or use __maybe_unused like most other drivers do here.
> 
>        Arnd

I do see a lot of drivers using __maybe_unused, but IMO conditional
compilation is slightly better. In case conditional compilation still has some
unexpected issue, we can always make a further fix. :-)

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH v2] Drivers: hv: vmbus: Fix harmless building warnings without CONFIG_PM_SLEEP
From: Michael Kelley @ 2019-09-20 17:41 UTC (permalink / raw)
  To: Dexuan Cui, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, arnd@arndb.de
In-Reply-To: <1568929540-116670-1-git-send-email-decui@microsoft.com>

From: Dexuan Cui <decui@microsoft.com> Sent: Thursday, September 19, 2019 2:46 PM
> 
> If CONFIG_PM_SLEEP is not set, we can comment out these functions to avoid
> the below warnings:
> 
> drivers/hv/vmbus_drv.c:2208:12: warning: ‘vmbus_bus_resume’ defined but not used [-
> Wunused-function]
> drivers/hv/vmbus_drv.c:2128:12: warning: ‘vmbus_bus_suspend’ defined but not used [-
> Wunused-function]
> drivers/hv/vmbus_drv.c:937:12: warning: ‘vmbus_resume’ defined but not used [-
> Wunused-function]
> drivers/hv/vmbus_drv.c:918:12: warning: ‘vmbus_suspend’ defined but not used [-
> Wunused-function]
> 
> Fixes: 271b2224d42f ("Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for
> hibernation")
> Fixes: f53335e3289f ("Drivers: hv: vmbus: Suspend/resume the vmbus itself for
> hibernation")
> Reported-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> 
> In v2:
> 	test CONFIG_PM_SLEEP rather than CONFIG_PM. Thanks, Arnd!
> 
>  drivers/hv/vmbus_drv.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 

Reviewed-by:  Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply

* RE: [PATHC v6] video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
From: Michael Kelley @ 2019-09-20 17:26 UTC (permalink / raw)
  To: Michael Kelley, Wei Hu, rdunlap@infradead.org, shc_work@mail.ru,
	gregkh@linuxfoundation.org, lee.jones@linaro.org,
	alexandre.belloni@bootlin.com, baijiaju1990@gmail.com,
	info@metux.net, linux-hyperv@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linux-fbdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, sashal@kernel.org,
	Stephen Hemminger, Haiyang Zhang, KY Srinivasan, Dexuan Cui
In-Reply-To: <DM5PR21MB0137DA408FE59E8C1171CFFCD78E0@DM5PR21MB0137.namprd21.prod.outlook.com>

From: Michael Kelley <mikelley@microsoft.com>  Sent: Wednesday, September 18, 2019 2:48 PM
> >
> > Without deferred IO support, hyperv_fb driver informs the host to refresh
> > the entire guest frame buffer at fixed rate, e.g. at 20Hz, no matter there
> > is screen update or not. This patch supports deferred IO for screens in
> > graphics mode and also enables the frame buffer on-demand refresh. The
> > highest refresh rate is still set at 20Hz.
> >
> > Currently Hyper-V only takes a physical address from guest as the starting
> > address of frame buffer. This implies the guest must allocate contiguous
> > physical memory for frame buffer. In addition, Hyper-V Gen 2 VMs only
> > accept address from MMIO region as frame buffer address. Due to these
> > limitations on Hyper-V host, we keep a shadow copy of frame buffer
> > in the guest. This means one more copy of the dirty rectangle inside
> > guest when doing the on-demand refresh. This can be optimized in the
> > future with help from host. For now the host performance gain from deferred
> > IO outweighs the shadow copy impact in the guest.
> >
> > Signed-off-by: Wei Hu <weh@microsoft.com>

Sasha -- this patch and one other from Wei Hu for the Hyper-V frame buffer
driver should be ready.  Both patches affect only the Hyper-V frame buffer
driver so can go through the Hyper-V tree.  Can you pick these up?  Thx.

Michael

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox