Linux-HyperV List

Linux-HyperV List
 help / color / mirror / Atom feed

* RE: [PATCH] hv: vmbus: mark PM functions as __maybe_unused
From: Dexuan Cui @ 2019-09-19  5:15 UTC (permalink / raw)
  To: Arnd Bergmann, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sasha Levin
  Cc: Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190918200052.2261769-1-arnd@arndb.de>

> From: Arnd Bergmann <arnd@arndb.de>
> Sent: Wednesday, September 18, 2019 1:01 PM
> 
> When CONFIG_PM is disabled, we get a couple of harmless warnings:
> 
> drivers/hv/vmbus_drv.c:918:12: error: unused function 'vmbus_suspend'
> [-Werror,-Wunused-function]
> drivers/hv/vmbus_drv.c:937:12: error: unused function 'vmbus_resume'
> [-Werror,-Wunused-function]
> drivers/hv/vmbus_drv.c:2128:12: error: unused function 'vmbus_bus_suspend'
> [-Werror,-Wunused-function]
> drivers/hv/vmbus_drv.c:2208:12: error: unused function 'vmbus_bus_resume'
> [-Werror,-Wunused-function]
> 
> Mark these functions __maybe_unused to let gcc drop them silently.

Hi Arnd,
Thanks for reporting the issue!

If CONFIG_PM is not set, IMO it's better to comment out these functions. :-)

I'll post a patch for this with you Cc'd.

Thanks,
-- Dexuan

^ permalink raw reply

* [PATCH] Drivers: hv: vmbus: Fix harmless building warnings without CONFIG_PM
From: Dexuan Cui @ 2019-09-19  5:19 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley, arnd@arndb.de
  Cc: Dexuan Cui

If CONFIG_PM is not set, we can comment out these functions to avoid the
below warnings:

drivers/hv/vmbus_drv.c:2208:12: warning: ‘vmbus_bus_resume’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:2128:12: warning: ‘vmbus_bus_suspend’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:937:12: warning: ‘vmbus_resume’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:918:12: warning: ‘vmbus_suspend’ defined but not used [-Wunused-function]

Fixes: 271b2224d42f ("Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation")
Fixes: f53335e3289f ("Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/hv/vmbus_drv.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 391f0b225c9a..8bfb36695413 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -912,6 +912,7 @@ static void vmbus_shutdown(struct device *child_device)
 		drv->shutdown(dev);
 }
 
+#ifdef CONFIG_PM
 /*
  * vmbus_suspend - Suspend a vmbus device
  */
@@ -949,6 +950,7 @@ static int vmbus_resume(struct device *child_device)
 
 	return drv->resume(dev);
 }
+#endif /* CONFIG_PM */
 
 /*
  * vmbus_device_release - Final callback release of the vmbus child device
@@ -1070,6 +1072,7 @@ void vmbus_on_msg_dpc(unsigned long data)
 	vmbus_signal_eom(msg, message_type);
 }
 
+#ifdef CONFIG_PM
 /*
  * Fake RESCIND_CHANNEL messages to clean up hv_sock channels by force for
  * hibernation, because hv_sock connections can not persist across hibernation.
@@ -1105,6 +1108,7 @@ static void vmbus_force_channel_rescinded(struct vmbus_channel *channel)
 		      vmbus_connection.work_queue,
 		      &ctx->work);
 }
+#endif /* CONFIG_PM */
 
 /*
  * Direct callback for channels using other deferred processing
@@ -2125,6 +2129,7 @@ static int vmbus_acpi_add(struct acpi_device *device)
 	return ret_val;
 }
 
+#ifdef CONFIG_PM
 static int vmbus_bus_suspend(struct device *dev)
 {
 	struct vmbus_channel *channel, *sc;
@@ -2247,6 +2252,7 @@ static int vmbus_bus_resume(struct device *dev)
 
 	return 0;
 }
+#endif /* CONFIG_PM */
 
 static const struct acpi_device_id vmbus_acpi_device_ids[] = {
 	{"VMBUS", 0},
-- 
2.19.1


^ permalink raw reply related

* RE: [PATCH 1/3] hv_utils: Add the support of hibernation
From: Dexuan Cui @ 2019-09-19  6:34 UTC (permalink / raw)
  To: vkuznets
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <87pnk0bpe8.fsf@vitty.brq.redhat.com>

> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> Sent: Monday, September 16, 2019 1:46 AM
> Dexuan Cui <decui@microsoft.com> writes:
> 
> >> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> Sent: Thursday, September 12, 2019 9:37 AM
> >
> >> > +static int util_suspend(struct hv_device *dev)
> >> > +{
> >> > +	struct hv_util_service *srv = hv_get_drvdata(dev);
> >> > +
> >> > +	if (srv->util_cancel_work)
> >> > +		srv->util_cancel_work();
> >> > +
> >> > +	vmbus_close(dev->channel);
> >>
> >> And what happens if you receive a real reply from userspace after you
> >> close the channel? You've only cancelled timeout work so the driver
> >> will not try to reply by itself but this doesn't mean we won't try to
> >> write to the channel on legitimate replies from daemons.
> >>
> >> I think you need to block all userspace requests (hang in kernel until
> >> util_resume()).
> >>
> >> While I'm not sure we can't get away without it but I'd suggest we add a
> >> new HVUTIL_SUSPENDED state to the hvutil state machine.
> >> Vitaly
> >
> > When we reach util_suspend(), all the userspace processes have been
> > frozen: see kernel/power/hibernate.c: hibernate() -> freeze_processes(),
> > so here we can not receive a reply from the userspace daemon.
> >
> 
> Let's add a WARN() or something then as if this ever happens this is
> going to be realy tricky to catch.

Looking at the path hibernate() -> freeze_processes() -> 
try_to_freeze_tasks(true) -> freeze_task() -> fake_signal_wake_up(), I'm
sure when try_to_freeze_tasks(true) returns 0, all the user-space processes
must be frozen in do_signal() -> get_signal() -> try_to_freeze() -> ... -> 
__refrigerator().

hibernate () -> hibernation_snapshot () -> dpm_suspend() -> ... -> 
util_suspend() only runs after hibernate() -> freeze_processes(), so I'm
pretty sure we're safe here.

> > However, it looks there is indeed some tricky corner cases we need to deal
> > with: in util_resume(), before we call vmbus_open(), all the userspace
> > processes are still frozen, and the userspace daemon (e.g. hv_kvp_daemon)
> > can be in any of these states:
> >
> > 1. the driver has not buffered any message for the daemon. This is good.
> >
> > 2. the driver has buffered a message for the daemon, and
> > kvp_transaction.state is HVUTIL_USERSPACE_REQ. Later, the kvp daemon
> > writes the response to the driver, and in kvp_on_msg()
> > kvp_transaction.state is moved to HVUTIL_USERSPACE_RECV, but since
> > cancel_delayed_work_sync(&kvp_timeout_work) is false (the work has
> > been cancelled by util_suspend()), the driver reports nothing to the host,
> > which is good as the VM has undergone a hibernation event and IMO the
> > response may be stale and I believe the host is not expecting this
> > response from the VM at all (the host side application that requested the
> > KVP must have failed or timed out), but the bad thing is: the "state"
> > remains in HVUTIL_USERSPACE_RECV, preventing
> > hv_kvp_onchannelcallback() from reading from the channel ringbuffer.
> >
> > I suggest we work around this race condition by the below patch:
> >
> > --- a/drivers/hv/hv_kvp.c
> > +++ b/drivers/hv/hv_kvp.c
> > @@ -250,8 +250,8 @@ static int kvp_on_msg(void *msg, int len)
> >          */
> >         if (cancel_delayed_work_sync(&kvp_timeout_work)) {
> >                 kvp_respond_to_host(message, error);
> > -               hv_poll_channel(kvp_transaction.recv_channel,
> kvp_poll_wrapper);
> >         }
> > +       hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);
> >
> >         return 0;
> >  }
> >
> > How do you like this?
> >
> 
> Is it safe to call hv_poll_channel() simultaneously on several CPUs? It
> seems it is as we're doing
> 
> smp_call_function_single(channel->target_cpu, cb, channel, true);

Looks safe to me. The code has been there for years. :-)
 
> (I'm asking because if it's not, than doing what you suggest will open
> the following window: timeout function kvp_timeout_func() is already
> running but the daemon is trying to reply at the same moment).
> 
> > I can imagine there is still a small chance that the state machine can run
> > out of order, and the kvp daemon may exit due to the return values of
> > read()/write() being -EINVAL, but the chance should be small enough in
> > practice, and IMO the issue even exists in the current code when
> > hibernation is not involved, e.g. kvp_timeout_func() and kvp_on_msg()
> > can run concurrently; if kvp_on_msg() runs slowly due to some reason
> > (e.g. the kvp daemon is stopped as I'm gdb'ing it), kvp_timeout_func()
> > fires and moves the state to HVUTIL_READY; later kvp_on_msg() starts
> > to run and returns -EINVAL, and the kvp daemon will exit().
> >
> > IMHO here it looks extremely difficult to make things flawless (if that's
> > even possible), so it's necessary to ask the daemons to auto-restart once
> > they exit() unexpectedly. This can be achieved by configuring systemd
> > properly for the kvp/vss/fcopy services.
> 
> I think we can also teach daemons to ignore -EINVAL or switch to
> something like -EAGAIN in non-fatal cases.

Maybe the driver should return 0 rather than -EINVAL for the kvp daemon
in some scenarios, since kvp is never 100% reliable.

> > I'm not sure introducing a HVUTIL_SUSPENDED state would solve all
> > of the corner cases, but I'm sure that would further complicate the
> > current code, at least to me. :-)
> >
> 
> Maybe, if we don't need it than we don't. Basically, I see the only
> advantage in having such state: it makes our tricks to support
> hibernation more visible in the code.
> Vitaly

Let me think about this. 

BTW, for vss, maybe the VM should not hibernate if there is a backup 
ongoing? -- if the file system is frozen by hv_vss_daemon, and the VM
hibernates, then when the VM resumes back, it's almost always true that 
the VM won't receive the host's VSS_OP_THAW request, and the VM will
end up in an unusable state.

Thanks,
-- Dexuan


^ permalink raw reply

* RE: [PATCH 1/3] hv_utils: Add the support of hibernation
From: Vitaly Kuznetsov @ 2019-09-19 10:27 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <PU1P153MB01694ABFED7ADAE8B40A65F7BF890@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

Dexuan Cui <decui@microsoft.com> writes:

> BTW, for vss, maybe the VM should not hibernate if there is a backup 
> ongoing? -- if the file system is frozen by hv_vss_daemon, and the VM
> hibernates, then when the VM resumes back, it's almost always true that 
> the VM won't receive the host's VSS_OP_THAW request, and the VM will
> end up in an unusable state.

Makes sense. Or, alternatively, can we postpone hibernation until after
VSS_OP_THAW?

-- 
Vitaly


^ permalink raw reply

* Re: [PATCH] Drivers: hv: vmbus: Fix harmless building warnings without CONFIG_PM
From: Arnd Bergmann @ 2019-09-19 12:10 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <1568870297-108679-1-git-send-email-decui@microsoft.com>

On Thu, Sep 19, 2019 at 7:19 AM Dexuan Cui <decui@microsoft.com> wrote:
>
> If CONFIG_PM is not set, we can comment out these functions to avoid the
> below warnings:
>
> drivers/hv/vmbus_drv.c:2208:12: warning: ‘vmbus_bus_resume’ defined but not used [-Wunused-function]
> drivers/hv/vmbus_drv.c:2128:12: warning: ‘vmbus_bus_suspend’ defined but not used [-Wunused-function]
> drivers/hv/vmbus_drv.c:937:12: warning: ‘vmbus_resume’ defined but not used [-Wunused-function]
> drivers/hv/vmbus_drv.c:918:12: warning: ‘vmbus_suspend’ defined but not used [-Wunused-function]
>
> Fixes: 271b2224d42f ("Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation")
> Fixes: f53335e3289f ("Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation")
> Reported-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>

I think this will still produce a warning if CONFIG_PM is set but
CONFIG_PM_SLEEP is not, possibly in other configurations as
well.

      Arnd

^ permalink raw reply

* Re: [PATCH] Input: hyperv-keyboard: Add the support of hibernation
From: dmitry.torokhov @ 2019-09-19 16:17 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-input@vger.kernel.org, linux-kernel@vger.kernel.org,
	Michael Kelley
In-Reply-To: <1568244975-66795-1-git-send-email-decui@microsoft.com>

Hi Dexuan,

On Wed, Sep 11, 2019 at 11:36:20PM +0000, Dexuan Cui wrote:
> We need hv_kbd_pm_notify() to make sure the pm_wakeup_hard_event() call
> does not prevent the system from entering hibernation: the hibernation
> is a relatively long process, which can be aborted by the call
> pm_wakeup_hard_event(), which is invoked upon keyboard events.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> 
> This patch is basically a pure Hyper-V specific change and it has a
> build dependency on the commit 271b2224d42f ("Drivers: hv: vmbus: Implement
> suspend/resume for VSC drivers for hibernation"), which is on Sasha Levin's
> Hyper-V tree's hyperv-next branch:
> https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/log/?h=hyperv-next
> 
> I request this patch should go through Sasha's tree rather than the
> input subsystemi's tree.
> 
> Hi Dmitry, can you please Ack?
> 
>  drivers/input/serio/hyperv-keyboard.c | 68 ++++++++++++++++++++++++++++++++---
>  1 file changed, 63 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/input/serio/hyperv-keyboard.c b/drivers/input/serio/hyperv-keyboard.c
> index 88ae7c2..277dc4c 100644
> --- a/drivers/input/serio/hyperv-keyboard.c
> +++ b/drivers/input/serio/hyperv-keyboard.c
> @@ -10,6 +10,7 @@
>  #include <linux/hyperv.h>
>  #include <linux/serio.h>
>  #include <linux/slab.h>
> +#include <linux/suspend.h>
>  
>  /*
>   * Current version 1.0
> @@ -95,6 +96,9 @@ struct hv_kbd_dev {
>  	struct completion wait_event;
>  	spinlock_t lock; /* protects 'started' field */
>  	bool started;
> +
> +	struct notifier_block pm_nb;
> +	bool hibernation_in_progress;

Why do you use notifier block instead of exposing proper PM methods if
you want to support hibernation?

Thanks.

-- 
Dmitry

^ permalink raw reply

* RE: [PATCH] Drivers: hv: vmbus: Fix harmless building warnings without CONFIG_PM
From: Dexuan Cui @ 2019-09-19 21:38 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <CAK8P3a0oi2MQwt-P8taBt+VS+RTaoeNBgjoYNE7_L2VoQUSaEA@mail.gmail.com>

> Sent: Thursday, September 19, 2019 5:11 AM
> On Thu, Sep 19, 2019 at 7:19 AM Dexuan Cui <decui@microsoft.com> wrote:
> >
> > If CONFIG_PM is not set, we can comment out these functions to avoid the
> > below warnings:
> >
> > drivers/hv/vmbus_drv.c:2208:12: warning: ‘vmbus_bus_resume’ defined
> but not used [-Wunused-function]
> > drivers/hv/vmbus_drv.c:2128:12: warning: ‘vmbus_bus_suspend’ defined
> but not used [-Wunused-function]
> > drivers/hv/vmbus_drv.c:937:12: warning: ‘vmbus_resume’ defined but not
> used [-Wunused-function]
> > drivers/hv/vmbus_drv.c:918:12: warning: ‘vmbus_suspend’ defined but not
> used [-Wunused-function]
> >
> > Fixes: 271b2224d42f ("Drivers: hv: vmbus: Implement suspend/resume for
> VSC drivers for hibernation")
> > Fixes: f53335e3289f ("Drivers: hv: vmbus: Suspend/resume the vmbus itself
> for hibernation")
> > Reported-by: Arnd Bergmann <arnd@arndb.de>
> > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> 
> I think this will still produce a warning if CONFIG_PM is set but
> CONFIG_PM_SLEEP is not, possibly in other configurations as
> well.
> 
>  Arnd

You're correct. Thanks! 

I'll use " #ifdef CONFIG_PM_SLEEP ... #endif" instead.

The mentioned functions are only used in the micros
SET_NOIRQ_SYSTEM_SLEEP_PM_OPS, which is empty if CONFIG_PM_SLEEP
is not defined. So it looks to me using "#ifdef CONFIG_PM_SLEEP ..." should
resolve the issue.

BTW, CONFIG_PM_SLEEP depends on CONFIG_PM, so if CONFIG_PM is not
defined, CONFIG_PM_SLEEP is not defined either.

Thanks,
-- Dexuan

^ permalink raw reply

* [PATCH v2] Drivers: hv: vmbus: Fix harmless building warnings without CONFIG_PM_SLEEP
From: Dexuan Cui @ 2019-09-19 21:46 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley, arnd@arndb.de
  Cc: Dexuan Cui

If CONFIG_PM_SLEEP is not set, we can comment out these functions to avoid
the below warnings:

drivers/hv/vmbus_drv.c:2208:12: warning: ‘vmbus_bus_resume’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:2128:12: warning: ‘vmbus_bus_suspend’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:937:12: warning: ‘vmbus_resume’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:918:12: warning: ‘vmbus_suspend’ defined but not used [-Wunused-function]

Fixes: 271b2224d42f ("Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation")
Fixes: f53335e3289f ("Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

In v2:
	test CONFIG_PM_SLEEP rather than CONFIG_PM. Thanks, Arnd!

 drivers/hv/vmbus_drv.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 391f0b225c9a..53a60c81e220 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -912,6 +912,7 @@ static void vmbus_shutdown(struct device *child_device)
 		drv->shutdown(dev);
 }
 
+#ifdef CONFIG_PM_SLEEP
 /*
  * vmbus_suspend - Suspend a vmbus device
  */
@@ -949,6 +950,7 @@ static int vmbus_resume(struct device *child_device)
 
 	return drv->resume(dev);
 }
+#endif /* CONFIG_PM_SLEEP */
 
 /*
  * vmbus_device_release - Final callback release of the vmbus child device
@@ -1070,6 +1072,7 @@ void vmbus_on_msg_dpc(unsigned long data)
 	vmbus_signal_eom(msg, message_type);
 }
 
+#ifdef CONFIG_PM_SLEEP
 /*
  * Fake RESCIND_CHANNEL messages to clean up hv_sock channels by force for
  * hibernation, because hv_sock connections can not persist across hibernation.
@@ -1105,6 +1108,7 @@ static void vmbus_force_channel_rescinded(struct vmbus_channel *channel)
 		      vmbus_connection.work_queue,
 		      &ctx->work);
 }
+#endif /* CONFIG_PM_SLEEP */
 
 /*
  * Direct callback for channels using other deferred processing
@@ -2125,6 +2129,7 @@ static int vmbus_acpi_add(struct acpi_device *device)
 	return ret_val;
 }
 
+#ifdef CONFIG_PM_SLEEP
 static int vmbus_bus_suspend(struct device *dev)
 {
 	struct vmbus_channel *channel, *sc;
@@ -2247,6 +2252,7 @@ static int vmbus_bus_resume(struct device *dev)
 
 	return 0;
 }
+#endif /* CONFIG_PM_SLEEP */
 
 static const struct acpi_device_id vmbus_acpi_device_ids[] = {
 	{"VMBUS", 0},
-- 
2.19.1


^ permalink raw reply related

* RE: [PATCH v5 1/2] drivers: hv: vmbus: Introduce latency testing
From: Michael Kelley @ 2019-09-19 22:52 UTC (permalink / raw)
  To: brandonbonaby94, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org
  Cc: brandonbonaby94, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <83b5fc34e8f25c882f2502931f766ef547c6c950.1568320416.git.brandonbonaby94@gmail.com>

From: Branden Bonaby <brandonbonaby94@gmail.com> Sent: Thursday, September 12, 2019 7:32 PM
> 
> +
> +static int hv_debugfs_delay_set(void *data, u64 val)
> +{
> +	int ret = 0;
> +
> +	if (val >= 0 && val <= 1000)
> +		*(u32 *)data = val;
> +	else
> +		ret = -EINVAL;
> +
> +	return ret;
> +}

I should probably quit picking at your code, but I'm going to
do it one more time. :-)

The above test for val >=0 is redundant as 'val' is declared
as 'u64'.  As an unsigned value, it will always be >= 0.  More
broadly, the above function could be written as follows
with no loss of clarity.  This accomplishes the same thing in
only 4 lines of code instead of 6, and the main execution path
is in the sequential execution flow, not in an 'if' statement.

{
	if (val > 1000)
		return -EINVAL;
	*(u32 *)data = val;
	return 0;
}

Your code is correct as written, so this is arguably more a
matter of style, but Linux generally likes to do things clearly
and compactly with no extra motion.

> +/* Delay buffer/message reads on a vmbus channel */
> +void hv_debug_delay_test(struct vmbus_channel *channel, enum delay delay_type)
> +{
> +	struct vmbus_channel *test_channel =    channel->primary_channel ?
> +						channel->primary_channel :
> +						channel;
> +	bool state = test_channel->fuzz_testing_state;
> +
> +	if (state) {
> +		if (delay_type == 0)
> +			udelay(test_channel->fuzz_testing_interrupt_delay);
> +		else
> +			udelay(test_channel->fuzz_testing_message_delay);

This 'if/else' statement got me thinking.  You have an enum declared below
that lists the two options -- INTERRUPT_DELAY or MESSAGE_DELAY.  The
implication is that we might add more options in the future.  But the
above 'if/else' statement isn't really set up to easily add more options, and
the individual fields for fuzz_testing_interrupt_delay and
fuzz_testing_message_delay mean adding more branches to the 'if/else'
statement whenever a new DELAY type is added to the enum.   And the
same is true when adding the entries into debugfs.  A more general
solution might use arrays and loops, and treat the enum value as an
index into an array of delay values.  Extending to add another delay type
could be as easy as adding another entry to the enum declaration.

The current code is for the case where n=2 (i.e., two different delay
types), and as such probably doesn't warrant the full index/looping
treatment.  But in the future, if we add additional delay types, we'll
probably revise the code to do the index/looping approach.

So to be clear, at this point I'm not asking you to change the existing
code.  My comments are more of an observation and something to
think about in the future.

> 
> +enum delay {
> +	INTERRUPT_DELAY = 0,
> +	MESSAGE_DELAY   = 1,
> +};
> +

Michael

^ permalink raw reply

* Re: [PATCH] Drivers: hv: vmbus: Fix harmless building warnings without CONFIG_PM
From: Arnd Bergmann @ 2019-09-20  7:33 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <PU1P153MB016971CD922FC453F3E31E04BF890@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Thu, Sep 19, 2019 at 11:38 PM Dexuan Cui <decui@microsoft.com> wrote:
> > Sent: Thursday, September 19, 2019 5:11 AM
> > On Thu, Sep 19, 2019 at 7:19 AM Dexuan Cui <decui@microsoft.com> wrote:

> > I think this will still produce a warning if CONFIG_PM is set but
> > CONFIG_PM_SLEEP is not, possibly in other configurations as
> > well.
>
> You're correct. Thanks!
>
> I'll use " #ifdef CONFIG_PM_SLEEP ... #endif" instead.
>
> The mentioned functions are only used in the micros
> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS, which is empty if CONFIG_PM_SLEEP
> is not defined. So it looks to me using "#ifdef CONFIG_PM_SLEEP ..." should
> resolve the issue.

Probably, yes. There are sometimes surprising effects, such as when one of the
functions inside of the #ifdef call another function that is otherwise unused.

I would normally try to build a few hundred randconfig builds to be fairly sure
of a change like this, or use __maybe_unused like most other drivers do here.

       Arnd

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: David Hildenbrand @ 2019-09-20  8:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Souptick Joarder, linux-hyperv, Andrew Morton,
	Dan Williams, Haiyang Zhang, K. Y. Srinivasan, Michal Hocko,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <20190909114830.662-1-david@redhat.com>

On 09.09.19 13:48, David Hildenbrand wrote:
> Based on linux/next + "[PATCH 0/3] Remove __online_page_set_limits()"
> 
> Let's replace the __online_page...() functions by generic_online_page().
> Hyper-V only wants to delay the actual onlining of un-backed pages, so we
> can simpy re-use the generic function.
> 
> Only compile-tested.
> 
> Cc: Souptick Joarder <jrdr.linux@gmail.com>
> 
> David Hildenbrand (3):
>   mm/memory_hotplug: Export generic_online_page()
>   hv_balloon: Use generic_online_page()
>   mm/memory_hotplug: Remove __online_page_free() and
>     __online_page_increment_counters()
> 
>  drivers/hv/hv_balloon.c        |  3 +--
>  include/linux/memory_hotplug.h |  4 +---
>  mm/memory_hotplug.c            | 17 ++---------------
>  3 files changed, 4 insertions(+), 20 deletions(-)
> 

Ping, any comments on this one?

-- 

Thanks,

David / dhildenb

^ permalink raw reply

* RE: [PATHC v6] video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
From: Michael Kelley @ 2019-09-20 17:26 UTC (permalink / raw)
  To: Michael Kelley, Wei Hu, rdunlap@infradead.org, shc_work@mail.ru,
	gregkh@linuxfoundation.org, lee.jones@linaro.org,
	alexandre.belloni@bootlin.com, baijiaju1990@gmail.com,
	info@metux.net, linux-hyperv@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linux-fbdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, sashal@kernel.org,
	Stephen Hemminger, Haiyang Zhang, KY Srinivasan, Dexuan Cui
In-Reply-To: <DM5PR21MB0137DA408FE59E8C1171CFFCD78E0@DM5PR21MB0137.namprd21.prod.outlook.com>

From: Michael Kelley <mikelley@microsoft.com>  Sent: Wednesday, September 18, 2019 2:48 PM
> >
> > Without deferred IO support, hyperv_fb driver informs the host to refresh
> > the entire guest frame buffer at fixed rate, e.g. at 20Hz, no matter there
> > is screen update or not. This patch supports deferred IO for screens in
> > graphics mode and also enables the frame buffer on-demand refresh. The
> > highest refresh rate is still set at 20Hz.
> >
> > Currently Hyper-V only takes a physical address from guest as the starting
> > address of frame buffer. This implies the guest must allocate contiguous
> > physical memory for frame buffer. In addition, Hyper-V Gen 2 VMs only
> > accept address from MMIO region as frame buffer address. Due to these
> > limitations on Hyper-V host, we keep a shadow copy of frame buffer
> > in the guest. This means one more copy of the dirty rectangle inside
> > guest when doing the on-demand refresh. This can be optimized in the
> > future with help from host. For now the host performance gain from deferred
> > IO outweighs the shadow copy impact in the guest.
> >
> > Signed-off-by: Wei Hu <weh@microsoft.com>

Sasha -- this patch and one other from Wei Hu for the Hyper-V frame buffer
driver should be ready.  Both patches affect only the Hyper-V frame buffer
driver so can go through the Hyper-V tree.  Can you pick these up?  Thx.

Michael

^ permalink raw reply

* RE: [PATCH v2] Drivers: hv: vmbus: Fix harmless building warnings without CONFIG_PM_SLEEP
From: Michael Kelley @ 2019-09-20 17:41 UTC (permalink / raw)
  To: Dexuan Cui, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, arnd@arndb.de
In-Reply-To: <1568929540-116670-1-git-send-email-decui@microsoft.com>

From: Dexuan Cui <decui@microsoft.com> Sent: Thursday, September 19, 2019 2:46 PM
> 
> If CONFIG_PM_SLEEP is not set, we can comment out these functions to avoid
> the below warnings:
> 
> drivers/hv/vmbus_drv.c:2208:12: warning: ‘vmbus_bus_resume’ defined but not used [-
> Wunused-function]
> drivers/hv/vmbus_drv.c:2128:12: warning: ‘vmbus_bus_suspend’ defined but not used [-
> Wunused-function]
> drivers/hv/vmbus_drv.c:937:12: warning: ‘vmbus_resume’ defined but not used [-
> Wunused-function]
> drivers/hv/vmbus_drv.c:918:12: warning: ‘vmbus_suspend’ defined but not used [-
> Wunused-function]
> 
> Fixes: 271b2224d42f ("Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for
> hibernation")
> Fixes: f53335e3289f ("Drivers: hv: vmbus: Suspend/resume the vmbus itself for
> hibernation")
> Reported-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> 
> In v2:
> 	test CONFIG_PM_SLEEP rather than CONFIG_PM. Thanks, Arnd!
> 
>  drivers/hv/vmbus_drv.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 

Reviewed-by:  Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply

* RE: [PATCH] Drivers: hv: vmbus: Fix harmless building warnings without CONFIG_PM
From: Dexuan Cui @ 2019-09-20 21:02 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <CAK8P3a0EvdR0SZdPhRV2o3PrxHo4BpJdWzAjExmKHhwrOsL54Q@mail.gmail.com>

> From: Arnd Bergmann <arnd@arndb.de>
> Sent: Friday, September 20, 2019 12:33 AM
> On Thu, Sep 19, 2019 at 11:38 PM Dexuan Cui <decui@microsoft.com> wrote:
> > > Sent: Thursday, September 19, 2019 5:11 AM
> > > On Thu, Sep 19, 2019 at 7:19 AM Dexuan Cui <decui@microsoft.com>
> wrote:
> 
> > > I think this will still produce a warning if CONFIG_PM is set but
> > > CONFIG_PM_SLEEP is not, possibly in other configurations as
> > > well.
> >
> > You're correct. Thanks!
> >
> > I'll use " #ifdef CONFIG_PM_SLEEP ... #endif" instead.
> >
> > The mentioned functions are only used in the micros
> > SET_NOIRQ_SYSTEM_SLEEP_PM_OPS, which is empty if CONFIG_PM_SLEEP
> > is not defined. So it looks to me using "#ifdef CONFIG_PM_SLEEP ..." should
> > resolve the issue.
> 
> Probably, yes. There are sometimes surprising effects, such as when one of the
> functions inside of the #ifdef call another function that is otherwise unused.

I reviewed the related functions again and I believe with the v2 we don't have
such an issue as you described here.
 
> I would normally try to build a few hundred randconfig builds to be fairly sure
> of a change like this, or use __maybe_unused like most other drivers do here.
> 
>        Arnd

I do see a lot of drivers using __maybe_unused, but IMO conditional
compilation is slightly better. In case conditional compilation still has some
unexpected issue, we can always make a further fix. :-)

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH] Input: hyperv-keyboard: Add the support of hibernation
From: Dexuan Cui @ 2019-09-21  6:56 UTC (permalink / raw)
  To: dmitry.torokhov@gmail.com
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-input@vger.kernel.org, linux-kernel@vger.kernel.org,
	Michael Kelley
In-Reply-To: <20190919161752.GS237523@dtor-ws>

> From: dmitry.torokhov@gmail.com <dmitry.torokhov@gmail.com>
> Sent: Thursday, September 19, 2019 9:18 AM
> 
> Hi Dexuan,
> 
> On Wed, Sep 11, 2019 at 11:36:20PM +0000, Dexuan Cui wrote:
> > We need hv_kbd_pm_notify() to make sure the pm_wakeup_hard_event()
> call
> > does not prevent the system from entering hibernation: the hibernation
> > is a relatively long process, which can be aborted by the call
> > pm_wakeup_hard_event(), which is invoked upon keyboard events.
> >
> > diff --git a/drivers/input/serio/hyperv-keyboard.c
> b/drivers/input/serio/hyperv-keyboard.c
> > index 88ae7c2..277dc4c 100644
> > --- a/drivers/input/serio/hyperv-keyboard.c
> > +++ b/drivers/input/serio/hyperv-keyboard.c
> > @@ -10,6 +10,7 @@
> >  #include <linux/hyperv.h>
> >  #include <linux/serio.h>
> >  #include <linux/slab.h>
> > +#include <linux/suspend.h>
> >
> >  /*
> >   * Current version 1.0
> > @@ -95,6 +96,9 @@ struct hv_kbd_dev {
> >  	struct completion wait_event;
> >  	spinlock_t lock; /* protects 'started' field */
> >  	bool started;
> > +
> > +	struct notifier_block pm_nb;
> > +	bool hibernation_in_progress;
> 
> Why do you use notifier block instead of exposing proper PM methods if
> you want to support hibernation?
> 
> Dmitry

Hi,
In the patch I do implement hv_kbd_suspend() and hv_kbd_resume(), and
add them into the hv_kbd_drv struct:

@@ -416,6 +472,8 @@ static struct  hv_driver hv_kbd_drv = {
        .id_table = id_table,
        .probe = hv_kbd_probe,
        .remove = hv_kbd_remove,
+       .suspend = hv_kbd_suspend,
+       .resume = hv_kbd_resume,

The .suspend and .resume callbacks are inroduced by another patch (which
uses the dev_pm_ops struct):
271b2224d42f ("Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation")
(which is on the Hyper-V tree's hyperv-next branch:
https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/commit/?h=hyperv-next&id=271b2224d42f88870e6b060924ee374871c131fc )

The only purpose of the notifier is to set the variable 
kbd_dev->hibernation_in_progress to true during the hibernation process.

As I explained in the changelog, the hibernation is a long process (which
can take 10+ seconds), during which the user may unintentionally touch
the keyboard, causing key up/down events, which are still handled by
hv_kbd_on_receive(), which calls pm_wakeup_hard_event(), which
calls some other functions which increase the global counter
"pm_abort_suspend", and hence pm_wakeup_pending() becomes true.

pm_wakeup_pending() is tested in a lot of places in the suspend
process and eventually an unintentional keystroke (or mouse movement,
when it comes to the Hyper-V mouse driver drivers/hid/hid-hyperv.c)
causes the whole hibernation process to be aborted. Usually this
behavior is not expected by the user, I think.

So, I use the notifier to set the flag variable and with it the driver can
know when it should not call pm_wakeup_hard_event().

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH 1/3] hv_utils: Add the support of hibernation
From: Dexuan Cui @ 2019-09-21  7:26 UTC (permalink / raw)
  To: vkuznets
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <87ftksa8dg.fsf@vitty.brq.redhat.com>

> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> Sent: Thursday, September 19, 2019 3:28 AM
> 
> Dexuan Cui <decui@microsoft.com> writes:
> 
> > BTW, for vss, maybe the VM should not hibernate if there is a backup
> > ongoing? -- if the file system is frozen by hv_vss_daemon, and the VM
> > hibernates, then when the VM resumes back, it's almost always true that
> > the VM won't receive the host's VSS_OP_THAW request, and the VM will
> > end up in an unusable state.
> 
> Makes sense. Or, alternatively, can we postpone hibernation until after
> VSS_OP_THAW?
> 
> Vitaly

It looks we should not postpone that, because:
1. When we're in util_suspend(), all the user space processes have been
frozen, so even if the VM receives the VSS_OP_THAW message form the host,
there is no chance for the hv_vss_daemon to handle it. 

2. Between the window the host sends the VSS_OP_FREEZE message and the
VSS_OP_THAW mesasge, util_suspend() may jump in and close the channel,
and then the host will not send a VSS_OP_THAW.

3. The host doesn't guarantee how soon it sends the VSS_OP_THAW message,
though in practice IIRC the host *usually* sends the message soon. The
hibernation process has a watchdog of 120s set by dpm_watchdog_set(): if
dpm_suspend() (which calls util_probe()) can not finish in 120 seconds, the
hibernation will be aborted.

3 may not look like a strong reason, but generally speaking I'd like to avoid
an indeterminate dependency.

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: Michal Hocko @ 2019-09-23  8:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <f73c4d0f-ad81-81a6-1107-852f2b9cad41@redhat.com>

On Fri 20-09-19 10:17:54, David Hildenbrand wrote:
> On 09.09.19 13:48, David Hildenbrand wrote:
> > Based on linux/next + "[PATCH 0/3] Remove __online_page_set_limits()"
> > 
> > Let's replace the __online_page...() functions by generic_online_page().
> > Hyper-V only wants to delay the actual onlining of un-backed pages, so we
> > can simpy re-use the generic function.
> > 
> > Only compile-tested.
> > 
> > Cc: Souptick Joarder <jrdr.linux@gmail.com>
> > 
> > David Hildenbrand (3):
> >   mm/memory_hotplug: Export generic_online_page()
> >   hv_balloon: Use generic_online_page()
> >   mm/memory_hotplug: Remove __online_page_free() and
> >     __online_page_increment_counters()
> > 
> >  drivers/hv/hv_balloon.c        |  3 +--
> >  include/linux/memory_hotplug.h |  4 +---
> >  mm/memory_hotplug.c            | 17 ++---------------
> >  3 files changed, 4 insertions(+), 20 deletions(-)
> > 
> 
> Ping, any comments on this one?

Unification makes a lot of sense to me. You can add
Acked-by: Michal Hocko <mhocko@suse.com>

I will most likely won't surprise if I asked for more here though ;)
I have to confess I really detest the whole concept of a hidden callback
with a very weird API. Is this something we can do about? I do realize
that adding a callback would require either cluttering the existing APIs
but maybe we can come up with something more clever. Or maybe existing
external users of online callback can do that as a separate step after
the online is completed - or is this impossible due to locking
guarantees?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: David Hildenbrand @ 2019-09-23  9:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <20190923085807.GD6016@dhcp22.suse.cz>

On 23.09.19 10:58, Michal Hocko wrote:
> On Fri 20-09-19 10:17:54, David Hildenbrand wrote:
>> On 09.09.19 13:48, David Hildenbrand wrote:
>>> Based on linux/next + "[PATCH 0/3] Remove __online_page_set_limits()"
>>>
>>> Let's replace the __online_page...() functions by generic_online_page().
>>> Hyper-V only wants to delay the actual onlining of un-backed pages, so we
>>> can simpy re-use the generic function.
>>>
>>> Only compile-tested.
>>>
>>> Cc: Souptick Joarder <jrdr.linux@gmail.com>
>>>
>>> David Hildenbrand (3):
>>>   mm/memory_hotplug: Export generic_online_page()
>>>   hv_balloon: Use generic_online_page()
>>>   mm/memory_hotplug: Remove __online_page_free() and
>>>     __online_page_increment_counters()
>>>
>>>  drivers/hv/hv_balloon.c        |  3 +--
>>>  include/linux/memory_hotplug.h |  4 +---
>>>  mm/memory_hotplug.c            | 17 ++---------------
>>>  3 files changed, 4 insertions(+), 20 deletions(-)
>>>
>>
>> Ping, any comments on this one?
> 
> Unification makes a lot of sense to me. You can add
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
> I will most likely won't surprise if I asked for more here though ;)

I'm not surprised, but definitely not in a negative sense ;) I was
asking myself if we could somehow rework this, too.

> I have to confess I really detest the whole concept of a hidden callback
> with a very weird API. Is this something we can do about? I do realize
> that adding a callback would require either cluttering the existing APIs
> but maybe we can come up with something more clever. Or maybe existing
> external users of online callback can do that as a separate step after
> the online is completed - or is this impossible due to locking
> guarantees?
> 

The use case of this (somewhat special) callback really is to avoid
selected (unbacked in the hypervisor) pages to get put to the buddy just
now, but instead to defer that (sometimes, defer till infinity ;) ).
Especially, to hinder these pages from getting touched at all. Pages
that won't be put to the buddy will usually get PG_offline set (e.g.,
Hyper-V and XEN) - the only two users I am aware of.

For Hyper-V (and also eventually virtio-mem), it is important to set
PG_offline before marking the section to be online (SECTION_IS_ONLINE).
Only this way, PG_offline is properly set on all pfn_to_online_page()
pages, meaning "don't touch this page" - e.g., used to skip over such
pages when suspending or by makedumpfile to skip over such offline pages
when creating a memory dump.

So if we would e.g., try to piggy-back onto the memory_notify()
infrastructure, we could
1. Online all pages to the buddy (dropping the callback)
2. E.g., memory_notify(MEM_ONLINE_PAGES, &arg);
-> in the notifier, pull pages from the buddy, mark sections online
3. Set all involved sections online (online_mem_sections())

However, I am not sure what actually happens after 1. - we are only
holding the device hotplug lock and the memory hotplug lock, so the
pages can just get allocated. Also, it sounds like more work and code
for the same end result (okay, if the rework is really necessary, though).

So yeah, while the current callback might not be optimal, I don't see an
easy and clean way to rework this. With the change in this series we are
at least able to simply defer doing what would have been done without
the callback - not perfect but better.

Do you have anything in mind that could work out and make this nicer?

-- 

Thanks,

David / dhildenb

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: Michal Hocko @ 2019-09-23 11:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <df15f269-48df-8738-c714-9fae3cb3b44c@redhat.com>

On Mon 23-09-19 11:31:30, David Hildenbrand wrote:
> On 23.09.19 10:58, Michal Hocko wrote:
> > On Fri 20-09-19 10:17:54, David Hildenbrand wrote:
> >> On 09.09.19 13:48, David Hildenbrand wrote:
> >>> Based on linux/next + "[PATCH 0/3] Remove __online_page_set_limits()"
> >>>
> >>> Let's replace the __online_page...() functions by generic_online_page().
> >>> Hyper-V only wants to delay the actual onlining of un-backed pages, so we
> >>> can simpy re-use the generic function.
> >>>
> >>> Only compile-tested.
> >>>
> >>> Cc: Souptick Joarder <jrdr.linux@gmail.com>
> >>>
> >>> David Hildenbrand (3):
> >>>   mm/memory_hotplug: Export generic_online_page()
> >>>   hv_balloon: Use generic_online_page()
> >>>   mm/memory_hotplug: Remove __online_page_free() and
> >>>     __online_page_increment_counters()
> >>>
> >>>  drivers/hv/hv_balloon.c        |  3 +--
> >>>  include/linux/memory_hotplug.h |  4 +---
> >>>  mm/memory_hotplug.c            | 17 ++---------------
> >>>  3 files changed, 4 insertions(+), 20 deletions(-)
> >>>
> >>
> >> Ping, any comments on this one?
> > 
> > Unification makes a lot of sense to me. You can add
> > Acked-by: Michal Hocko <mhocko@suse.com>
> > 
> > I will most likely won't surprise if I asked for more here though ;)
> 
> I'm not surprised, but definitely not in a negative sense ;) I was
> asking myself if we could somehow rework this, too.
> 
> > I have to confess I really detest the whole concept of a hidden callback
> > with a very weird API. Is this something we can do about? I do realize
> > that adding a callback would require either cluttering the existing APIs
> > but maybe we can come up with something more clever. Or maybe existing
> > external users of online callback can do that as a separate step after
> > the online is completed - or is this impossible due to locking
> > guarantees?
> > 
> 
> The use case of this (somewhat special) callback really is to avoid
> selected (unbacked in the hypervisor) pages to get put to the buddy just
> now, but instead to defer that (sometimes, defer till infinity ;) ).
> Especially, to hinder these pages from getting touched at all. Pages
> that won't be put to the buddy will usually get PG_offline set (e.g.,
> Hyper-V and XEN) - the only two users I am aware of.
> 
> For Hyper-V (and also eventually virtio-mem), it is important to set
> PG_offline before marking the section to be online (SECTION_IS_ONLINE).
> Only this way, PG_offline is properly set on all pfn_to_online_page()
> pages, meaning "don't touch this page" - e.g., used to skip over such
> pages when suspending or by makedumpfile to skip over such offline pages
> when creating a memory dump.

Thanks for the clarification. I have never really studied what those
callbacks are doing really.

> So if we would e.g., try to piggy-back onto the memory_notify()
> infrastructure, we could
> 1. Online all pages to the buddy (dropping the callback)
> 2. E.g., memory_notify(MEM_ONLINE_PAGES, &arg);
> -> in the notifier, pull pages from the buddy, mark sections online
> 3. Set all involved sections online (online_mem_sections())

This doesn't really sound any better. For one pages are immediately
usable when they hit the buddy allocator so this is racy and thus not
reliable.

> However, I am not sure what actually happens after 1. - we are only
> holding the device hotplug lock and the memory hotplug lock, so the
> pages can just get allocated. Also, it sounds like more work and code
> for the same end result (okay, if the rework is really necessary, though).
> 
> So yeah, while the current callback might not be optimal, I don't see an
> easy and clean way to rework this. With the change in this series we are
> at least able to simply defer doing what would have been done without
> the callback - not perfect but better.
> 
> Do you have anything in mind that could work out and make this nicer?

I am wondering why those pages get onlined when they are, in fact,
supposed to be offline.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: David Hildenbrand @ 2019-09-23 11:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <20190923111559.GK6016@dhcp22.suse.cz>

On 23.09.19 13:15, Michal Hocko wrote:
> On Mon 23-09-19 11:31:30, David Hildenbrand wrote:
>> On 23.09.19 10:58, Michal Hocko wrote:
>>> On Fri 20-09-19 10:17:54, David Hildenbrand wrote:
>>>> On 09.09.19 13:48, David Hildenbrand wrote:
>>>>> Based on linux/next + "[PATCH 0/3] Remove __online_page_set_limits()"
>>>>>
>>>>> Let's replace the __online_page...() functions by generic_online_page().
>>>>> Hyper-V only wants to delay the actual onlining of un-backed pages, so we
>>>>> can simpy re-use the generic function.
>>>>>
>>>>> Only compile-tested.
>>>>>
>>>>> Cc: Souptick Joarder <jrdr.linux@gmail.com>
>>>>>
>>>>> David Hildenbrand (3):
>>>>>   mm/memory_hotplug: Export generic_online_page()
>>>>>   hv_balloon: Use generic_online_page()
>>>>>   mm/memory_hotplug: Remove __online_page_free() and
>>>>>     __online_page_increment_counters()
>>>>>
>>>>>  drivers/hv/hv_balloon.c        |  3 +--
>>>>>  include/linux/memory_hotplug.h |  4 +---
>>>>>  mm/memory_hotplug.c            | 17 ++---------------
>>>>>  3 files changed, 4 insertions(+), 20 deletions(-)
>>>>>
>>>>
>>>> Ping, any comments on this one?
>>>
>>> Unification makes a lot of sense to me. You can add
>>> Acked-by: Michal Hocko <mhocko@suse.com>
>>>
>>> I will most likely won't surprise if I asked for more here though ;)
>>
>> I'm not surprised, but definitely not in a negative sense ;) I was
>> asking myself if we could somehow rework this, too.
>>
>>> I have to confess I really detest the whole concept of a hidden callback
>>> with a very weird API. Is this something we can do about? I do realize
>>> that adding a callback would require either cluttering the existing APIs
>>> but maybe we can come up with something more clever. Or maybe existing
>>> external users of online callback can do that as a separate step after
>>> the online is completed - or is this impossible due to locking
>>> guarantees?
>>>
>>
>> The use case of this (somewhat special) callback really is to avoid
>> selected (unbacked in the hypervisor) pages to get put to the buddy just
>> now, but instead to defer that (sometimes, defer till infinity ;) ).
>> Especially, to hinder these pages from getting touched at all. Pages
>> that won't be put to the buddy will usually get PG_offline set (e.g.,
>> Hyper-V and XEN) - the only two users I am aware of.
>>
>> For Hyper-V (and also eventually virtio-mem), it is important to set
>> PG_offline before marking the section to be online (SECTION_IS_ONLINE).
>> Only this way, PG_offline is properly set on all pfn_to_online_page()
>> pages, meaning "don't touch this page" - e.g., used to skip over such
>> pages when suspending or by makedumpfile to skip over such offline pages
>> when creating a memory dump.
> 
> Thanks for the clarification. I have never really studied what those
> callbacks are doing really.
> 
>> So if we would e.g., try to piggy-back onto the memory_notify()
>> infrastructure, we could
>> 1. Online all pages to the buddy (dropping the callback)
>> 2. E.g., memory_notify(MEM_ONLINE_PAGES, &arg);
>> -> in the notifier, pull pages from the buddy, mark sections online
>> 3. Set all involved sections online (online_mem_sections())
> 
> This doesn't really sound any better. For one pages are immediately
> usable when they hit the buddy allocator so this is racy and thus not
> reliable.
> 
>> However, I am not sure what actually happens after 1. - we are only
>> holding the device hotplug lock and the memory hotplug lock, so the
>> pages can just get allocated. Also, it sounds like more work and code
>> for the same end result (okay, if the rework is really necessary, though).
>>
>> So yeah, while the current callback might not be optimal, I don't see an
>> easy and clean way to rework this. With the change in this series we are
>> at least able to simply defer doing what would have been done without
>> the callback - not perfect but better.
>>
>> Do you have anything in mind that could work out and make this nicer?
> 
> I am wondering why those pages get onlined when they are, in fact,
> supposed to be offline.
> 

It's the current way of emulating sub-memory-block hotplug on top of the
memory bock device API we have. Hyper-V and XEN have been using that for
a long time.

-- 

Thanks,

David / dhildenb

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: David Hildenbrand @ 2019-09-23 11:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <88ac3511-4ad8-d5c8-8e6a-0cca0a0f0989@redhat.com>

On 23.09.19 13:34, David Hildenbrand wrote:
> On 23.09.19 13:15, Michal Hocko wrote:
>> On Mon 23-09-19 11:31:30, David Hildenbrand wrote:
>>> On 23.09.19 10:58, Michal Hocko wrote:
>>>> On Fri 20-09-19 10:17:54, David Hildenbrand wrote:
>>>>> On 09.09.19 13:48, David Hildenbrand wrote:
>>>>>> Based on linux/next + "[PATCH 0/3] Remove __online_page_set_limits()"
>>>>>>
>>>>>> Let's replace the __online_page...() functions by generic_online_page().
>>>>>> Hyper-V only wants to delay the actual onlining of un-backed pages, so we
>>>>>> can simpy re-use the generic function.
>>>>>>
>>>>>> Only compile-tested.
>>>>>>
>>>>>> Cc: Souptick Joarder <jrdr.linux@gmail.com>
>>>>>>
>>>>>> David Hildenbrand (3):
>>>>>>   mm/memory_hotplug: Export generic_online_page()
>>>>>>   hv_balloon: Use generic_online_page()
>>>>>>   mm/memory_hotplug: Remove __online_page_free() and
>>>>>>     __online_page_increment_counters()
>>>>>>
>>>>>>  drivers/hv/hv_balloon.c        |  3 +--
>>>>>>  include/linux/memory_hotplug.h |  4 +---
>>>>>>  mm/memory_hotplug.c            | 17 ++---------------
>>>>>>  3 files changed, 4 insertions(+), 20 deletions(-)
>>>>>>
>>>>>
>>>>> Ping, any comments on this one?
>>>>
>>>> Unification makes a lot of sense to me. You can add
>>>> Acked-by: Michal Hocko <mhocko@suse.com>
>>>>
>>>> I will most likely won't surprise if I asked for more here though ;)
>>>
>>> I'm not surprised, but definitely not in a negative sense ;) I was
>>> asking myself if we could somehow rework this, too.
>>>
>>>> I have to confess I really detest the whole concept of a hidden callback
>>>> with a very weird API. Is this something we can do about? I do realize
>>>> that adding a callback would require either cluttering the existing APIs
>>>> but maybe we can come up with something more clever. Or maybe existing
>>>> external users of online callback can do that as a separate step after
>>>> the online is completed - or is this impossible due to locking
>>>> guarantees?
>>>>
>>>
>>> The use case of this (somewhat special) callback really is to avoid
>>> selected (unbacked in the hypervisor) pages to get put to the buddy just
>>> now, but instead to defer that (sometimes, defer till infinity ;) ).
>>> Especially, to hinder these pages from getting touched at all. Pages
>>> that won't be put to the buddy will usually get PG_offline set (e.g.,
>>> Hyper-V and XEN) - the only two users I am aware of.
>>>
>>> For Hyper-V (and also eventually virtio-mem), it is important to set
>>> PG_offline before marking the section to be online (SECTION_IS_ONLINE).
>>> Only this way, PG_offline is properly set on all pfn_to_online_page()
>>> pages, meaning "don't touch this page" - e.g., used to skip over such
>>> pages when suspending or by makedumpfile to skip over such offline pages
>>> when creating a memory dump.
>>
>> Thanks for the clarification. I have never really studied what those
>> callbacks are doing really.
>>
>>> So if we would e.g., try to piggy-back onto the memory_notify()
>>> infrastructure, we could
>>> 1. Online all pages to the buddy (dropping the callback)
>>> 2. E.g., memory_notify(MEM_ONLINE_PAGES, &arg);
>>> -> in the notifier, pull pages from the buddy, mark sections online
>>> 3. Set all involved sections online (online_mem_sections())
>>
>> This doesn't really sound any better. For one pages are immediately
>> usable when they hit the buddy allocator so this is racy and thus not
>> reliable.
>>
>>> However, I am not sure what actually happens after 1. - we are only
>>> holding the device hotplug lock and the memory hotplug lock, so the
>>> pages can just get allocated. Also, it sounds like more work and code
>>> for the same end result (okay, if the rework is really necessary, though).
>>>
>>> So yeah, while the current callback might not be optimal, I don't see an
>>> easy and clean way to rework this. With the change in this series we are
>>> at least able to simply defer doing what would have been done without
>>> the callback - not perfect but better.
>>>
>>> Do you have anything in mind that could work out and make this nicer?
>>
>> I am wondering why those pages get onlined when they are, in fact,
>> supposed to be offline.
>>
> 
> It's the current way of emulating sub-memory-block hotplug on top of the
> memory bock device API we have. Hyper-V and XEN have been using that for
> a long time.
> 

So one idea would be to let clients set pages to PG_offline during
MEM_GOING_ONLINE. We could then skip any PG_offline pages when onlining
pages, not onlining them to the buddy.

But then, there still has to be a way to online pages when required -
e.g., generic_online_page(). At least the callback could go.

-- 

Thanks,

David / dhildenb

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: Michal Hocko @ 2019-09-23 12:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <88ac3511-4ad8-d5c8-8e6a-0cca0a0f0989@redhat.com>

On Mon 23-09-19 13:34:18, David Hildenbrand wrote:
> On 23.09.19 13:15, Michal Hocko wrote:
[...]
> > I am wondering why those pages get onlined when they are, in fact,
> > supposed to be offline.
> > 
> 
> It's the current way of emulating sub-memory-block hotplug on top of the
> memory bock device API we have. Hyper-V and XEN have been using that for
> a long time.

Do they really have to use the existing block interface when they in
fact do not operate on the block granularity? Zone device memory already
acts on sub section/block boundaries.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: David Hildenbrand @ 2019-09-23 12:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <20190923120719.GM6016@dhcp22.suse.cz>

On 23.09.19 14:07, Michal Hocko wrote:
> On Mon 23-09-19 13:34:18, David Hildenbrand wrote:
>> On 23.09.19 13:15, Michal Hocko wrote:
> [...]
>>> I am wondering why those pages get onlined when they are, in fact,
>>> supposed to be offline.
>>>
>>
>> It's the current way of emulating sub-memory-block hotplug on top of the
>> memory bock device API we have. Hyper-V and XEN have been using that for
>> a long time.
> 
> Do they really have to use the existing block interface when they in
> fact do not operate on the block granularity? Zone device memory already
> acts on sub section/block boundaries.
> 

Yes, we need memory blocks, especially for user space to properly online
them (as we discussed a while back, to decide on a zone) and for udev
events, to e.g., properly reload kexec when memory blocks get
added/removed/onlined/offlined.

ZONE_DEVICE is special as it doesn't have to worry about
onlining/offlining/dumping. Also, it doesn't care about
SECTION_IS_ONLINE, but figuring out how to detect which sub
sections/blocks have a valid memmap is still something to get fixed and
is still getting discussed on MM.

-- 

Thanks,

David / dhildenb

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: Michal Hocko @ 2019-09-23 12:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <b60b783e-a251-f155-3cef-e0fa4a18abd0@redhat.com>

On Mon 23-09-19 14:20:05, David Hildenbrand wrote:
> On 23.09.19 14:07, Michal Hocko wrote:
> > On Mon 23-09-19 13:34:18, David Hildenbrand wrote:
> >> On 23.09.19 13:15, Michal Hocko wrote:
> > [...]
> >>> I am wondering why those pages get onlined when they are, in fact,
> >>> supposed to be offline.
> >>>
> >>
> >> It's the current way of emulating sub-memory-block hotplug on top of the
> >> memory bock device API we have. Hyper-V and XEN have been using that for
> >> a long time.
> > 
> > Do they really have to use the existing block interface when they in
> > fact do not operate on the block granularity? Zone device memory already
> > acts on sub section/block boundaries.
> > 
> 
> Yes, we need memory blocks, especially for user space to properly online
> them (as we discussed a while back, to decide on a zone) and for udev
> events, to e.g., properly reload kexec when memory blocks get
> added/removed/onlined/offlined.

Just to make sure I really follow. We need a user interface to control
where the memory gets onlined but it is the driver which determines
which part of the block really gets onlined, right?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH v1 0/3] mm/memory_hotplug: Export generic_online_page()
From: David Hildenbrand @ 2019-09-23 12:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Souptick Joarder, linux-hyperv,
	Andrew Morton, Dan Williams, Haiyang Zhang, K. Y. Srinivasan,
	Oscar Salvador, Pavel Tatashin, Qian Cai, Sasha Levin,
	Stephen Hemminger, Wei Yang
In-Reply-To: <20190923123008.GP6016@dhcp22.suse.cz>

On 23.09.19 14:30, Michal Hocko wrote:
> On Mon 23-09-19 14:20:05, David Hildenbrand wrote:
>> On 23.09.19 14:07, Michal Hocko wrote:
>>> On Mon 23-09-19 13:34:18, David Hildenbrand wrote:
>>>> On 23.09.19 13:15, Michal Hocko wrote:
>>> [...]
>>>>> I am wondering why those pages get onlined when they are, in fact,
>>>>> supposed to be offline.
>>>>>
>>>>
>>>> It's the current way of emulating sub-memory-block hotplug on top of the
>>>> memory bock device API we have. Hyper-V and XEN have been using that for
>>>> a long time.
>>>
>>> Do they really have to use the existing block interface when they in
>>> fact do not operate on the block granularity? Zone device memory already
>>> acts on sub section/block boundaries.
>>>
>>
>> Yes, we need memory blocks, especially for user space to properly online
>> them (as we discussed a while back, to decide on a zone) and for udev
>> events, to e.g., properly reload kexec when memory blocks get
>> added/removed/onlined/offlined.
> 
> Just to make sure I really follow. We need a user interface to control
> where the memory gets onlined but it is the driver which determines
> which part of the block really gets onlined, right?
> 

Yes, it's the drivers job to decide which part of the memory block can
actually get used, and when.

It's a little bit like the driver immediately allocates unbacked memory
again, to free that memory to the buddy once requested. (similar to how
ballooning works).

-- 

Thanks,

David / dhildenb

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox