* [PATCH v2 0/2] Drivers: hv: vmbus: Wait for offers and log missing offers
@ 2024-10-29 8:01 Naman Jain
2024-10-29 8:01 ` [PATCH v2 1/2] Drivers: hv: vmbus: Wait for offers during boot Naman Jain
2024-10-29 8:01 ` [PATCH v2 2/2] Drivers: hv: vmbus: Log on missing offers Naman Jain
0 siblings, 2 replies; 14+ messages in thread
From: Naman Jain @ 2024-10-29 8:01 UTC (permalink / raw)
To: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui
Cc: linux-hyperv, linux-kernel, Naman Jain, John Starks, jacob.pan,
Easwar Hariharan, Michael Kelley, Saurabh Singh Sengar
After VM requests for channel offers during boot or resume from
hibernation, host offers the configured channels and then sends a
separate message when all the essential channel offers are delivered.
Wait for this message to make this offers request and receipt process
synchronous.
Without this, user mode can race with VMBus initialization and miss
channel offers. User mode has no way to work around this other than
sleeping for a while, since there is no way to know when VMBus has
finished processing offers.
This is in analogy to a PCI bus not returning from probe until it has
scanned all devices on the bus.
As part of this implementation, some code cleanup is also done for the
logic which becomes redundant due to this change.
Second patch prints the channels which are not offered when resume
happens from hibernation to supply more information to the end user.
Changes since v1:
https://lore.kernel.org/all/20241018115811.5530-1-namjain@linux.microsoft.com/
* Added Easwar's Reviewed-By tag
* Addressed Michael's comments:
* Added explanation of all offers delivered message in comments
* Removed infinite wait for offers logic, and changed it wait once.
* Removed sub channel workqueue flush logic
* Added comments on why MLX device offer is not expected as part of
this essential boot offer list. I refrained from adding too many
details on it as it felt like it is beyond the scope of this patch
series and may not be relevant to this. However, please let me know if
something needs to be added.
* Addressed Saurabh's comments:
* Changed timeout value to 10000 ms instead of 10*1000
* Changed commit msg as per suggestions
* Added a comment for warning case of wait_for_completion timeout
* Added a note for missing channel cleanup in comments and commit msg
John Starks (1):
Drivers: hv: vmbus: Log on missing offers
Naman Jain (1):
Drivers: hv: vmbus: Wait for offers during boot
drivers/hv/channel_mgmt.c | 55 ++++++++++++++++++++++++++++-----------
drivers/hv/connection.c | 4 +--
drivers/hv/hyperv_vmbus.h | 14 +++-------
drivers/hv/vmbus_drv.c | 31 +++++++++++-----------
4 files changed, 61 insertions(+), 43 deletions(-)
base-commit: 6fb2fa9805c501d9ade047fc511961f3273cdcb5
--
2.34.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 1/2] Drivers: hv: vmbus: Wait for offers during boot
2024-10-29 8:01 [PATCH v2 0/2] Drivers: hv: vmbus: Wait for offers and log missing offers Naman Jain
@ 2024-10-29 8:01 ` Naman Jain
2024-10-30 20:31 ` Easwar Hariharan
2024-10-31 19:13 ` Michael Kelley
2024-10-29 8:01 ` [PATCH v2 2/2] Drivers: hv: vmbus: Log on missing offers Naman Jain
1 sibling, 2 replies; 14+ messages in thread
From: Naman Jain @ 2024-10-29 8:01 UTC (permalink / raw)
To: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui
Cc: linux-hyperv, linux-kernel, Naman Jain, John Starks, jacob.pan,
Easwar Hariharan, Michael Kelley, Saurabh Singh Sengar
Channel offers are requested during VMBus initialization and resume
from hibernation. Add support to wait for all channel offers to be
delivered and processed before returning from vmbus_request_offers.
This is in analogy to a PCI bus not returning from probe until it has
scanned all devices on the bus.
Without this, user mode can race with VMBus initialization and miss
channel offers. User mode has no way to work around this other than
sleeping for a while, since there is no way to know when VMBus has
finished processing offers.
With this added functionality, remove earlier logic which keeps track
of count of offered channels post resume from hibernation. Once all
offers delivered message is received, no further offers are going to
be received. Consequently, logic to prevent suspend from happening
after previous resume had missing offers, is also removed.
Co-developed-by: John Starks <jostarks@microsoft.com>
Signed-off-by: John Starks <jostarks@microsoft.com>
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
---
Changes since v1:
https://lore.kernel.org/all/20241018115811.5530-1-namjain@linux.microsoft.com
* Added Easwar's Reviewed-By tag
* Addressed Michael's comments:
* Added explanation of all offers delivered message in comments
* Removed infinite wait for offers logic, and changed it wait once.
* Removed sub channel workqueue flush logic
* Added comments on why MLX device offer is not expected as part of
this essential boot offer list. I refrained from adding too many details on it as it felt like it is beyond the scope of this patch series and may not be relevant to this. However, please let me know if
something needs to be added.
* Addressed Saurabh's comments:
* Changed timeout value to 10000 ms instead of 10*10000
* Changed commit msg as per suggestions
* Added a comment for warning case of wait_for_completion timeout
---
drivers/hv/channel_mgmt.c | 55 ++++++++++++++++++++++++++++-----------
drivers/hv/connection.c | 4 +--
drivers/hv/hyperv_vmbus.h | 14 +++-------
drivers/hv/vmbus_drv.c | 16 ------------
4 files changed, 45 insertions(+), 44 deletions(-)
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 3c6011a48dab..a2e9ebe5bf72 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -944,16 +944,6 @@ void vmbus_initiate_unload(bool crash)
vmbus_wait_for_unload();
}
-static void check_ready_for_resume_event(void)
-{
- /*
- * If all the old primary channels have been fixed up, then it's safe
- * to resume.
- */
- if (atomic_dec_and_test(&vmbus_connection.nr_chan_fixup_on_resume))
- complete(&vmbus_connection.ready_for_resume_event);
-}
-
static void vmbus_setup_channel_state(struct vmbus_channel *channel,
struct vmbus_channel_offer_channel *offer)
{
@@ -1109,8 +1099,6 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
/* Add the channel back to the array of channels. */
vmbus_channel_map_relid(oldchannel);
- check_ready_for_resume_event();
-
mutex_unlock(&vmbus_connection.channel_mutex);
return;
}
@@ -1296,13 +1284,22 @@ EXPORT_SYMBOL_GPL(vmbus_hvsock_device_unregister);
/*
* vmbus_onoffers_delivered -
- * This is invoked when all offers have been delivered.
+ * CHANNELMSG_ALLOFFERS_DELIVERED message arrives after all the essential
+ * boot-time offers are delivered. Other channels can be hot added
+ * or removed later, even immediately after the all-offers-delivered
+ * message. A boot-time offer will be any of the virtual hardware the
+ * VM is configured with at boot.
*
- * Nothing to do here.
+ * Virtual devices like Mellanox NIC may not be included in the list of
+ * these initial boot offers because it is an optional accelerator to
+ * the synthetic VMBus NIC. It is hot added only after the VMBus NIC
+ * channel is opened (once it knows the guest can support it, via the
+ * sriov bit in the netvsc protocol).
*/
static void vmbus_onoffers_delivered(
struct vmbus_channel_message_header *hdr)
{
+ complete(&vmbus_connection.all_offers_delivered_event);
}
/*
@@ -1578,7 +1575,8 @@ void vmbus_onmessage(struct vmbus_channel_message_header *hdr)
}
/*
- * vmbus_request_offers - Send a request to get all our pending offers.
+ * vmbus_request_offers - Send a request to get all our pending offers
+ * and wait for all offers to arrive.
*/
int vmbus_request_offers(void)
{
@@ -1596,6 +1594,10 @@ int vmbus_request_offers(void)
msg->msgtype = CHANNELMSG_REQUESTOFFERS;
+ /*
+ * This REQUESTOFFERS message will result in the host sending an all
+ * offers delivered message.
+ */
ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_message_header),
true);
@@ -1607,6 +1609,29 @@ int vmbus_request_offers(void)
goto cleanup;
}
+ /*
+ * Wait for the host to send all offers.
+ * Keeping it as a best-effort mechanism, where a warning is
+ * printed if a timeout occurs, and execution is resumed.
+ */
+ if (!wait_for_completion_timeout(
+ &vmbus_connection.all_offers_delivered_event, msecs_to_jiffies(10000))) {
+ pr_warn("timed out waiting for all offers to be delivered...\n");
+ }
+
+ /*
+ * Flush handling of offer messages (which may initiate work on
+ * other work queues).
+ */
+ flush_workqueue(vmbus_connection.work_queue);
+
+ /*
+ * Flush workqueues for processing the incoming offers. Subchannel
+ * offers and processing can happen later, so there is no need to
+ * flush those workqueues here.
+ */
+ flush_workqueue(vmbus_connection.handle_primary_chan_wq);
+
cleanup:
kfree(msginfo);
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index f001ae880e1d..8351360bba16 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -34,8 +34,8 @@ struct vmbus_connection vmbus_connection = {
.ready_for_suspend_event = COMPLETION_INITIALIZER(
vmbus_connection.ready_for_suspend_event),
- .ready_for_resume_event = COMPLETION_INITIALIZER(
- vmbus_connection.ready_for_resume_event),
+ .all_offers_delivered_event = COMPLETION_INITIALIZER(
+ vmbus_connection.all_offers_delivered_event),
};
EXPORT_SYMBOL_GPL(vmbus_connection);
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index d2856023d53c..80cc65dac740 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -287,18 +287,10 @@ struct vmbus_connection {
struct completion ready_for_suspend_event;
/*
- * The number of primary channels that should be "fixed up"
- * upon resume: these channels are re-offered upon resume, and some
- * fields of the channel offers (i.e. child_relid and connection_id)
- * can change, so the old offermsg must be fixed up, before the resume
- * callbacks of the VSC drivers start to further touch the channels.
+ * Completed once the host has offered all channels. Note that
+ * some channels may still be being process on a work queue.
*/
- atomic_t nr_chan_fixup_on_resume;
- /*
- * vmbus_bus_resume() waits for "nr_chan_fixup_on_resume" to
- * drop to zero.
- */
- struct completion ready_for_resume_event;
+ struct completion all_offers_delivered_event;
};
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 9b15f7daf505..bd3fc41dc06b 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2427,11 +2427,6 @@ static int vmbus_bus_suspend(struct device *dev)
if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
wait_for_completion(&vmbus_connection.ready_for_suspend_event);
- if (atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0) {
- pr_err("Can not suspend due to a previous failed resuming\n");
- return -EBUSY;
- }
-
mutex_lock(&vmbus_connection.channel_mutex);
list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
@@ -2456,17 +2451,12 @@ static int vmbus_bus_suspend(struct device *dev)
pr_err("Sub-channel not deleted!\n");
WARN_ON_ONCE(1);
}
-
- atomic_inc(&vmbus_connection.nr_chan_fixup_on_resume);
}
mutex_unlock(&vmbus_connection.channel_mutex);
vmbus_initiate_unload(false);
- /* Reset the event for the next resume. */
- reinit_completion(&vmbus_connection.ready_for_resume_event);
-
return 0;
}
@@ -2502,14 +2492,8 @@ static int vmbus_bus_resume(struct device *dev)
if (ret != 0)
return ret;
- WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) == 0);
-
vmbus_request_offers();
- if (wait_for_completion_timeout(
- &vmbus_connection.ready_for_resume_event, 10 * HZ) == 0)
- pr_err("Some vmbus device is missing after suspending?\n");
-
/* Reset the event for the next suspend. */
reinit_completion(&vmbus_connection.ready_for_suspend_event);
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 2/2] Drivers: hv: vmbus: Log on missing offers
2024-10-29 8:01 [PATCH v2 0/2] Drivers: hv: vmbus: Wait for offers and log missing offers Naman Jain
2024-10-29 8:01 ` [PATCH v2 1/2] Drivers: hv: vmbus: Wait for offers during boot Naman Jain
@ 2024-10-29 8:01 ` Naman Jain
2024-10-31 19:14 ` Michael Kelley
1 sibling, 1 reply; 14+ messages in thread
From: Naman Jain @ 2024-10-29 8:01 UTC (permalink / raw)
To: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui
Cc: linux-hyperv, linux-kernel, Naman Jain, John Starks, jacob.pan,
Easwar Hariharan, Michael Kelley, Saurabh Singh Sengar
From: John Starks <jostarks@microsoft.com>
When resuming from hibernation, log any channels that were present
before hibernation but now are gone.
In general, the essential virtual devices configured for a VM, remain
same, before and after the hibernation and its not very common that
some offers are missing. The cleanup of missing channels is not
straight-forward and dependent on individual device driver
functionality and implementation, so it can be added in future as
separate changes.
Signed-off-by: John Starks <jostarks@microsoft.com>
Co-developed-by: Naman Jain <namjain@linux.microsoft.com>
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
---
Changes since v1:
https://lore.kernel.org/all/20241018115811.5530-1-namjain@linux.microsoft.com
* Added Easwar's Reviewed-By tag
* Addressed Saurabh's comments:
* Added a note for missing channel cleanup in comments and commit msg
---
drivers/hv/vmbus_drv.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index bd3fc41dc06b..08214f28694a 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2462,6 +2462,7 @@ static int vmbus_bus_suspend(struct device *dev)
static int vmbus_bus_resume(struct device *dev)
{
+ struct vmbus_channel *channel;
struct vmbus_channel_msginfo *msginfo;
size_t msgsize;
int ret;
@@ -2494,6 +2495,22 @@ static int vmbus_bus_resume(struct device *dev)
vmbus_request_offers();
+ mutex_lock(&vmbus_connection.channel_mutex);
+ list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
+ if (channel->offermsg.child_relid != INVALID_RELID)
+ continue;
+
+ /* hvsock channels are not expected to be present. */
+ if (is_hvsock_channel(channel))
+ continue;
+
+ pr_err("channel %pUl/%pUl not present after resume.\n",
+ &channel->offermsg.offer.if_type,
+ &channel->offermsg.offer.if_instance);
+ /* ToDo: Cleanup these channels here */
+ }
+ mutex_unlock(&vmbus_connection.channel_mutex);
+
/* Reset the event for the next suspend. */
reinit_completion(&vmbus_connection.ready_for_suspend_event);
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] Drivers: hv: vmbus: Wait for offers during boot
2024-10-29 8:01 ` [PATCH v2 1/2] Drivers: hv: vmbus: Wait for offers during boot Naman Jain
@ 2024-10-30 20:31 ` Easwar Hariharan
2024-11-05 5:40 ` Naman Jain
2024-10-31 19:13 ` Michael Kelley
1 sibling, 1 reply; 14+ messages in thread
From: Easwar Hariharan @ 2024-10-30 20:31 UTC (permalink / raw)
To: Naman Jain, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui
Cc: eahariha, linux-hyperv, linux-kernel, John Starks, jacob.pan,
Michael Kelley, Saurabh Singh Sengar
On 10/29/2024 1:01 AM, Naman Jain wrote:
> Channel offers are requested during VMBus initialization and resume
> from hibernation. Add support to wait for all channel offers to be
> delivered and processed before returning from vmbus_request_offers.
>
> This is in analogy to a PCI bus not returning from probe until it has
> scanned all devices on the bus.
>
> Without this, user mode can race with VMBus initialization and miss
> channel offers. User mode has no way to work around this other than
> sleeping for a while, since there is no way to know when VMBus has
> finished processing offers.
>
> With this added functionality, remove earlier logic which keeps track
> of count of offered channels post resume from hibernation. Once all
> offers delivered message is received, no further offers are going to
> be received. Consequently, logic to prevent suspend from happening
> after previous resume had missing offers, is also removed.
>
> Co-developed-by: John Starks <jostarks@microsoft.com>
> Signed-off-by: John Starks <jostarks@microsoft.com>
> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
> ---
> Changes since v1:
> https://lore.kernel.org/all/20241018115811.5530-1-namjain@linux.microsoft.com
> * Added Easwar's Reviewed-By tag
> * Addressed Michael's comments:
> * Added explanation of all offers delivered message in comments
> * Removed infinite wait for offers logic, and changed it wait once.
> * Removed sub channel workqueue flush logic
> * Added comments on why MLX device offer is not expected as part of
> this essential boot offer list. I refrained from adding too many details on it as it felt like it is beyond the scope of this patch series and may not be relevant to this. However, please let me know if
> something needs to be added.
> * Addressed Saurabh's comments:
> * Changed timeout value to 10000 ms instead of 10*10000
> * Changed commit msg as per suggestions
> * Added a comment for warning case of wait_for_completion timeout
> ---
> drivers/hv/channel_mgmt.c | 55 ++++++++++++++++++++++++++++-----------
> drivers/hv/connection.c | 4 +--
> drivers/hv/hyperv_vmbus.h | 14 +++-------
> drivers/hv/vmbus_drv.c | 16 ------------
> 4 files changed, 45 insertions(+), 44 deletions(-)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 3c6011a48dab..a2e9ebe5bf72 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -944,16 +944,6 @@ void vmbus_initiate_unload(bool crash)
> vmbus_wait_for_unload();
> }
>
> -static void check_ready_for_resume_event(void)
> -{
> - /*
> - * If all the old primary channels have been fixed up, then it's safe
> - * to resume.
> - */
> - if (atomic_dec_and_test(&vmbus_connection.nr_chan_fixup_on_resume))
> - complete(&vmbus_connection.ready_for_resume_event);
> -}
> -
> static void vmbus_setup_channel_state(struct vmbus_channel *channel,
> struct vmbus_channel_offer_channel *offer)
> {
> @@ -1109,8 +1099,6 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
>
> /* Add the channel back to the array of channels. */
> vmbus_channel_map_relid(oldchannel);
> - check_ready_for_resume_event();
> -
> mutex_unlock(&vmbus_connection.channel_mutex);
> return;
> }
> @@ -1296,13 +1284,22 @@ EXPORT_SYMBOL_GPL(vmbus_hvsock_device_unregister);
>
> /*
> * vmbus_onoffers_delivered -
> - * This is invoked when all offers have been delivered.
> + * CHANNELMSG_ALLOFFERS_DELIVERED message arrives after all the essential
> + * boot-time offers are delivered. Other channels can be hot added
> + * or removed later, even immediately after the all-offers-delivered
> + * message. A boot-time offer will be any of the virtual hardware the
> + * VM is configured with at boot.
> *
> - * Nothing to do here.
> + * Virtual devices like Mellanox NIC may not be included in the list of
> + * these initial boot offers because it is an optional accelerator to
> + * the synthetic VMBus NIC. It is hot added only after the VMBus NIC
> + * channel is opened (once it knows the guest can support it, via the
> + * sriov bit in the netvsc protocol).
> */
> static void vmbus_onoffers_delivered(
> struct vmbus_channel_message_header *hdr)
> {
> + complete(&vmbus_connection.all_offers_delivered_event);
> }
>
> /*
> @@ -1578,7 +1575,8 @@ void vmbus_onmessage(struct vmbus_channel_message_header *hdr)
> }
>
> /*
> - * vmbus_request_offers - Send a request to get all our pending offers.
> + * vmbus_request_offers - Send a request to get all our pending offers
> + * and wait for all offers to arrive.
> */
> int vmbus_request_offers(void)
> {
> @@ -1596,6 +1594,10 @@ int vmbus_request_offers(void)
>
> msg->msgtype = CHANNELMSG_REQUESTOFFERS;
>
> + /*
> + * This REQUESTOFFERS message will result in the host sending an all
> + * offers delivered message.
> + */
> ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_message_header),
> true);
>
> @@ -1607,6 +1609,29 @@ int vmbus_request_offers(void)
> goto cleanup;
> }
>
> + /*
> + * Wait for the host to send all offers.
> + * Keeping it as a best-effort mechanism, where a warning is
> + * printed if a timeout occurs, and execution is resumed.
> + */
> + if (!wait_for_completion_timeout(
> + &vmbus_connection.all_offers_delivered_event, msecs_to_jiffies(10000))) {
> + pr_warn("timed out waiting for all offers to be delivered...\n");
> + }
secs_to_jiffies() has been merged [1] so please update this to a call to
secs_to_jiffies(10). A Cocinelle script will probably take care of it
sooner or later in any case.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=b35108a51cf7bab58d7eace1267d7965978bcdb8
> +
> + /*
> + * Flush handling of offer messages (which may initiate work on
> + * other work queues).
> + */
> + flush_workqueue(vmbus_connection.work_queue);
> +
> + /*
> + * Flush workqueues for processing the incoming offers. Subchannel
> + * offers and processing can happen later, so there is no need to
> + * flush those workqueues here.
> + */
> + flush_workqueue(vmbus_connection.handle_primary_chan_wq);
> +
> cleanup:
> kfree(msginfo);
>
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index f001ae880e1d..8351360bba16 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -34,8 +34,8 @@ struct vmbus_connection vmbus_connection = {
>
> .ready_for_suspend_event = COMPLETION_INITIALIZER(
> vmbus_connection.ready_for_suspend_event),
> - .ready_for_resume_event = COMPLETION_INITIALIZER(
> - vmbus_connection.ready_for_resume_event),
> + .all_offers_delivered_event = COMPLETION_INITIALIZER(
> + vmbus_connection.all_offers_delivered_event),
> };
> EXPORT_SYMBOL_GPL(vmbus_connection);
>
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index d2856023d53c..80cc65dac740 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -287,18 +287,10 @@ struct vmbus_connection {
> struct completion ready_for_suspend_event;
>
> /*
> - * The number of primary channels that should be "fixed up"
> - * upon resume: these channels are re-offered upon resume, and some
> - * fields of the channel offers (i.e. child_relid and connection_id)
> - * can change, so the old offermsg must be fixed up, before the resume
> - * callbacks of the VSC drivers start to further touch the channels.
> + * Completed once the host has offered all channels. Note that
> + * some channels may still be being process on a work queue.
> */
> - atomic_t nr_chan_fixup_on_resume;
> - /*
> - * vmbus_bus_resume() waits for "nr_chan_fixup_on_resume" to
> - * drop to zero.
> - */
> - struct completion ready_for_resume_event;
> + struct completion all_offers_delivered_event;
> };
>
>
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 9b15f7daf505..bd3fc41dc06b 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2427,11 +2427,6 @@ static int vmbus_bus_suspend(struct device *dev)
> if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
> wait_for_completion(&vmbus_connection.ready_for_suspend_event);
>
> - if (atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0) {
> - pr_err("Can not suspend due to a previous failed resuming\n");
> - return -EBUSY;
> - }
> -
> mutex_lock(&vmbus_connection.channel_mutex);
>
> list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
> @@ -2456,17 +2451,12 @@ static int vmbus_bus_suspend(struct device *dev)
> pr_err("Sub-channel not deleted!\n");
> WARN_ON_ONCE(1);
> }
> -
> - atomic_inc(&vmbus_connection.nr_chan_fixup_on_resume);
> }
>
> mutex_unlock(&vmbus_connection.channel_mutex);
>
> vmbus_initiate_unload(false);
>
> - /* Reset the event for the next resume. */
> - reinit_completion(&vmbus_connection.ready_for_resume_event);
> -
> return 0;
> }
>
> @@ -2502,14 +2492,8 @@ static int vmbus_bus_resume(struct device *dev)
> if (ret != 0)
> return ret;
>
> - WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) == 0);
> -
> vmbus_request_offers();
>
> - if (wait_for_completion_timeout(
> - &vmbus_connection.ready_for_resume_event, 10 * HZ) == 0)
> - pr_err("Some vmbus device is missing after suspending?\n");
> -
> /* Reset the event for the next suspend. */
> reinit_completion(&vmbus_connection.ready_for_suspend_event);
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH v2 1/2] Drivers: hv: vmbus: Wait for offers during boot
2024-10-29 8:01 ` [PATCH v2 1/2] Drivers: hv: vmbus: Wait for offers during boot Naman Jain
2024-10-30 20:31 ` Easwar Hariharan
@ 2024-10-31 19:13 ` Michael Kelley
2024-11-05 5:40 ` Naman Jain
1 sibling, 1 reply; 14+ messages in thread
From: Michael Kelley @ 2024-10-31 19:13 UTC (permalink / raw)
To: Naman Jain, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan,
Saurabh Singh Sengar
From: Naman Jain <namjain@linux.microsoft.com> Sent: Tuesday, October 29, 2024 1:02 AM
>
> Channel offers are requested during VMBus initialization and resume
> from hibernation. Add support to wait for all channel offers to be
Given the definition of ALLOFFERS_DELIVERED, maybe change to:
" wait for all boot-time channel offers"
> delivered and processed before returning from vmbus_request_offers.
>
> This is in analogy to a PCI bus not returning from probe until it has
> scanned all devices on the bus.
>
> Without this, user mode can race with VMBus initialization and miss
> channel offers. User mode has no way to work around this other than
> sleeping for a while, since there is no way to know when VMBus has
> finished processing offers.
"finished processing boot-time offers."
>
> With this added functionality, remove earlier logic which keeps track
> of count of offered channels post resume from hibernation. Once all
> offers delivered message is received, no further offers are going to
> be received.
"no further boot-time offers"
> Consequently, logic to prevent suspend from happening
> after previous resume had missing offers, is also removed.
>
> Co-developed-by: John Starks <jostarks@microsoft.com>
> Signed-off-by: John Starks <jostarks@microsoft.com>
> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
> ---
> Changes since v1:
> https://lore.kernel.org/all/20241018115811.5530-1-namjain@linux.microsoft.com/
> * Added Easwar's Reviewed-By tag
> * Addressed Michael's comments:
> * Added explanation of all offers delivered message in comments
> * Removed infinite wait for offers logic, and changed it wait once.
> * Removed sub channel workqueue flush logic
> * Added comments on why MLX device offer is not expected as part of
> this essential boot offer list. I refrained from adding too many
You've used slightly different phrasing here ("essential boot offer list").
Potential confusion can be avoided if the terminology is super consistent.
I'm good with "boot-time offers" (or something else if you prefer). I'm not
sure what "essential" means here, unless it refers to offers for VFs from
SR-IOV NICs (like Mellanox) being excluded. But it should be fine to
use something short like "boot-time offers" and then note the VF
exception in the code comments as you've done.
> details on it as it felt like it is beyond the scope of this patch
> series and may not be relevant to this. However, please let me know if
> something needs to be added.
> * Addressed Saurabh's comments:
> * Changed timeout value to 10000 ms instead of 10*10000
> * Changed commit msg as per suggestions
> * Added a comment for warning case of wait_for_completion timeout
> ---
> drivers/hv/channel_mgmt.c | 55 ++++++++++++++++++++++++++++-----------
> drivers/hv/connection.c | 4 +--
> drivers/hv/hyperv_vmbus.h | 14 +++-------
> drivers/hv/vmbus_drv.c | 16 ------------
> 4 files changed, 45 insertions(+), 44 deletions(-)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 3c6011a48dab..a2e9ebe5bf72 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -944,16 +944,6 @@ void vmbus_initiate_unload(bool crash)
> vmbus_wait_for_unload();
> }
>
> -static void check_ready_for_resume_event(void)
> -{
> - /*
> - * If all the old primary channels have been fixed up, then it's safe
> - * to resume.
> - */
> - if (atomic_dec_and_test(&vmbus_connection.nr_chan_fixup_on_resume))
> - complete(&vmbus_connection.ready_for_resume_event);
> -}
> -
> static void vmbus_setup_channel_state(struct vmbus_channel *channel,
> struct vmbus_channel_offer_channel *offer)
> {
> @@ -1109,8 +1099,6 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
>
> /* Add the channel back to the array of channels. */
> vmbus_channel_map_relid(oldchannel);
> - check_ready_for_resume_event();
> -
> mutex_unlock(&vmbus_connection.channel_mutex);
> return;
> }
> @@ -1296,13 +1284,22 @@
> EXPORT_SYMBOL_GPL(vmbus_hvsock_device_unregister);
>
> /*
> * vmbus_onoffers_delivered -
> - * This is invoked when all offers have been delivered.
> + * CHANNELMSG_ALLOFFERS_DELIVERED message arrives after all the essential
Again, I would drop "essential" here, as its meaning in this context isn't
defined anywhere.
> + * boot-time offers are delivered. Other channels can be hot added
> + * or removed later, even immediately after the all-offers-delivered
> + * message. A boot-time offer will be any of the virtual hardware the
> + * VM is configured with at boot.
This is good to have a definition of "boot-time offer". But I think some
additional specification is appropriate. Let me suggest the following:
The CHANNELMSG_ALLOFFERS_DELIVERED message arrives after all
boot-time offers are delivered. A boot-time offer is for the primary channel
for any virtual hardware configured in the VM at the time it boots.
Boot-time offers include offers for physical devices assigned to the VM
via Hyper-V's Discrete Device Assignment (DDA) functionality that are
handled as virtual PCI devices in Linux (e.g., NVMe devices and GPUs).
Boot-time offers do not include offers for VMBus sub-channels. Because
devices can be hot-added to the VM after it is booted, additional channel
offers that aren't boot-time offers can be received at any time after the
all-offers-delivered message.
SR-IOV NIC Virtual Functions (VFs) assigned to a VM are not considered
to be assigned to the VM at boot-time, and offers for VFs may occur after
the all-offers-delivered message. VFs are optional accelerators to the
synthetic VMBus NIC and are effectively hot-added only after the VMBus
NIC channel is opened (once it knows the guest can support it, via the
sriov bit in the netvsc protocol).
[Please confirm that my suggested wording is correct!]
> */
> static void vmbus_onoffers_delivered(
> struct vmbus_channel_message_header *hdr)
> {
> + complete(&vmbus_connection.all_offers_delivered_event);
> }
>
> /*
> @@ -1578,7 +1575,8 @@ void vmbus_onmessage(struct vmbus_channel_message_header *hdr)
> }
>
> /*
> - * vmbus_request_offers - Send a request to get all our pending offers.
> + * vmbus_request_offers - Send a request to get all our pending offers
> + * and wait for all offers to arrive.
> */
> int vmbus_request_offers(void)
> {
> @@ -1596,6 +1594,10 @@ int vmbus_request_offers(void)
>
> msg->msgtype = CHANNELMSG_REQUESTOFFERS;
>
> + /*
> + * This REQUESTOFFERS message will result in the host sending an all
> + * offers delivered message.
> + */
> ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_message_header),
> true);
>
> @@ -1607,6 +1609,29 @@ int vmbus_request_offers(void)
> goto cleanup;
> }
>
> + /*
> + * Wait for the host to send all offers.
> + * Keeping it as a best-effort mechanism, where a warning is
> + * printed if a timeout occurs, and execution is resumed.
> + */
> + if (!wait_for_completion_timeout(
> + &vmbus_connection.all_offers_delivered_event, msecs_to_jiffies(10000))) {
> + pr_warn("timed out waiting for all offers to be delivered...\n");
> + }
> +
> + /*
> + * Flush handling of offer messages (which may initiate work on
> + * other work queues).
> + */
> + flush_workqueue(vmbus_connection.work_queue);
> +
> + /*
> + * Flush workqueues for processing the incoming offers. Subchannel
s/workqueues/workqueue/
> + * offers and processing can happen later, so there is no need to
> + * flush those workqueues here.
s/those workqueues/that workqueue/
> + */
> + flush_workqueue(vmbus_connection.handle_primary_chan_wq);
> +
> cleanup:
> kfree(msginfo);
>
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index f001ae880e1d..8351360bba16 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -34,8 +34,8 @@ struct vmbus_connection vmbus_connection = {
>
> .ready_for_suspend_event = COMPLETION_INITIALIZER(
> vmbus_connection.ready_for_suspend_event),
> - .ready_for_resume_event = COMPLETION_INITIALIZER(
> - vmbus_connection.ready_for_resume_event),
> + .all_offers_delivered_event = COMPLETION_INITIALIZER(
> + vmbus_connection.all_offers_delivered_event),
> };
> EXPORT_SYMBOL_GPL(vmbus_connection);
>
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index d2856023d53c..80cc65dac740 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -287,18 +287,10 @@ struct vmbus_connection {
> struct completion ready_for_suspend_event;
>
> /*
> - * The number of primary channels that should be "fixed up"
> - * upon resume: these channels are re-offered upon resume, and some
> - * fields of the channel offers (i.e. child_relid and connection_id)
> - * can change, so the old offermsg must be fixed up, before the resume
> - * callbacks of the VSC drivers start to further touch the channels.
> + * Completed once the host has offered all channels. Note that
"all boot-time channels"
> + * some channels may still be being process on a work queue.
> */
> - atomic_t nr_chan_fixup_on_resume;
> - /*
> - * vmbus_bus_resume() waits for "nr_chan_fixup_on_resume" to
> - * drop to zero.
> - */
> - struct completion ready_for_resume_event;
> + struct completion all_offers_delivered_event;
> };
>
>
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 9b15f7daf505..bd3fc41dc06b 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2427,11 +2427,6 @@ static int vmbus_bus_suspend(struct device *dev)
> if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
> wait_for_completion(&vmbus_connection.ready_for_suspend_event);
>
> - if (atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0) {
> - pr_err("Can not suspend due to a previous failed resuming\n");
> - return -EBUSY;
> - }
> -
> mutex_lock(&vmbus_connection.channel_mutex);
>
> list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
> @@ -2456,17 +2451,12 @@ static int vmbus_bus_suspend(struct device *dev)
> pr_err("Sub-channel not deleted!\n");
> WARN_ON_ONCE(1);
> }
> -
> - atomic_inc(&vmbus_connection.nr_chan_fixup_on_resume);
> }
>
> mutex_unlock(&vmbus_connection.channel_mutex);
>
> vmbus_initiate_unload(false);
>
> - /* Reset the event for the next resume. */
> - reinit_completion(&vmbus_connection.ready_for_resume_event);
> -
> return 0;
> }
>
> @@ -2502,14 +2492,8 @@ static int vmbus_bus_resume(struct device *dev)
> if (ret != 0)
> return ret;
>
> - WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) == 0);
> -
> vmbus_request_offers();
>
> - if (wait_for_completion_timeout(
> - &vmbus_connection.ready_for_resume_event, 10 * HZ) == 0)
> - pr_err("Some vmbus device is missing after suspending?\n");
> -
> /* Reset the event for the next suspend. */
> reinit_completion(&vmbus_connection.ready_for_suspend_event);
>
> --
> 2.34.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH v2 2/2] Drivers: hv: vmbus: Log on missing offers
2024-10-29 8:01 ` [PATCH v2 2/2] Drivers: hv: vmbus: Log on missing offers Naman Jain
@ 2024-10-31 19:14 ` Michael Kelley
2024-11-07 5:44 ` Naman Jain
0 siblings, 1 reply; 14+ messages in thread
From: Michael Kelley @ 2024-10-31 19:14 UTC (permalink / raw)
To: Naman Jain, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan,
Saurabh Singh Sengar
From: Naman Jain <namjain@linux.microsoft.com> Sent: Tuesday, October 29, 2024 1:02 AM
>
> When resuming from hibernation, log any channels that were present
> before hibernation but now are gone.
> In general, the essential virtual devices configured for a VM, remain
> same, before and after the hibernation and its not very common that
> some offers are missing.
The wording here is a bit jumbled. And let's use consistent terminology.
I'd suggest:
In general, the boot-time devices configured for a resuming VM should be
the same as the devices in the VM at the time of hibernation. It's uncommon
for the configuration to have been changed such that offers are missing.
Changing the configuration violates the rules for hibernation anyway.
> The cleanup of missing channels is not
> straight-forward and dependent on individual device driver
> functionality and implementation, so it can be added in future as
> separate changes.
>
> Signed-off-by: John Starks <jostarks@microsoft.com>
> Co-developed-by: Naman Jain <namjain@linux.microsoft.com>
> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
> ---
> Changes since v1:
> https://lore.kernel.org/all/20241018115811.5530-1-namjain@linux.microsoft.com/
> * Added Easwar's Reviewed-By tag
> * Addressed Saurabh's comments:
> * Added a note for missing channel cleanup in comments and commit msg
> ---
> drivers/hv/vmbus_drv.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index bd3fc41dc06b..08214f28694a 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2462,6 +2462,7 @@ static int vmbus_bus_suspend(struct device *dev)
>
> static int vmbus_bus_resume(struct device *dev)
> {
> + struct vmbus_channel *channel;
> struct vmbus_channel_msginfo *msginfo;
> size_t msgsize;
> int ret;
> @@ -2494,6 +2495,22 @@ static int vmbus_bus_resume(struct device *dev)
>
> vmbus_request_offers();
>
> + mutex_lock(&vmbus_connection.channel_mutex);
> + list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
> + if (channel->offermsg.child_relid != INVALID_RELID)
> + continue;
> +
> + /* hvsock channels are not expected to be present. */
> + if (is_hvsock_channel(channel))
> + continue;
> +
> + pr_err("channel %pUl/%pUl not present after resume.\n",
> + &channel->offermsg.offer.if_type,
> + &channel->offermsg.offer.if_instance);
> + /* ToDo: Cleanup these channels here */
> + }
> + mutex_unlock(&vmbus_connection.channel_mutex);
> +
Dexuan and John have explained how in Azure VMs, there should not be
any VFs assigned to the VM at the time of hibernation. So the above
check for missing offers does not trigger an error message due to
VF offers coming after the all-offers-received message.
But what about the case of a VM running on a local Hyper-V? I'm not
completely clear, but in that case I don't think any VFs are removed
before the hibernation, especially for VM-initiated hibernation. It's
a reasonable scenario to later resume that same VM, with the same
VF assigned to the VM. Because of the way current code counts
the offers, vmbus_bus_resume() waits for the VF to be offered again,
and all the channels get correct post-resume relids. But the changes
in this patch set break that scenario. Since vmbus_bus_resume() now
proceeds before the VF offer arrives, hv_pci_resume() calling
vmbus_open() could use the pre-hibernation relid for the VF and break
things. Certainly the "not present after resume" error message would
be spurious.
Maybe the focus here is Azure, and it's tolerable for the local Hyper-V
case with a VF to not work pending later fixes. But I thought I'd call
out the potential issue (assuming my thinking is correct).
Michael
> /* Reset the event for the next suspend. */
> reinit_completion(&vmbus_connection.ready_for_suspend_event);
>
> --
> 2.34.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] Drivers: hv: vmbus: Wait for offers during boot
2024-10-30 20:31 ` Easwar Hariharan
@ 2024-11-05 5:40 ` Naman Jain
0 siblings, 0 replies; 14+ messages in thread
From: Naman Jain @ 2024-11-05 5:40 UTC (permalink / raw)
To: Easwar Hariharan, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui
Cc: linux-hyperv, linux-kernel, John Starks, jacob.pan,
Michael Kelley, Saurabh Singh Sengar
On 10/31/2024 2:01 AM, Easwar Hariharan wrote:
> On 10/29/2024 1:01 AM, Naman Jain wrote:
>> Channel offers are requested during VMBus initialization and resume
>> from hibernation. Add support to wait for all channel offers to be
>> delivered and processed before returning from vmbus_request_offers.
>>
>> This is in analogy to a PCI bus not returning from probe until it has
>> scanned all devices on the bus.
>>
>> Without this, user mode can race with VMBus initialization and miss
>> channel offers. User mode has no way to work around this other than
>> sleeping for a while, since there is no way to know when VMBus has
>> finished processing offers.
>>
>> With this added functionality, remove earlier logic which keeps track
>> of count of offered channels post resume from hibernation. Once all
>> offers delivered message is received, no further offers are going to
>> be received. Consequently, logic to prevent suspend from happening
>> after previous resume had missing offers, is also removed.
>>
>> Co-developed-by: John Starks <jostarks@microsoft.com>
>> Signed-off-by: John Starks <jostarks@microsoft.com>
>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
>> ---
>> Changes since v1:
>> https://lore.kernel.org/all/20241018115811.5530-1-namjain@linux.microsoft.com
>> * Added Easwar's Reviewed-By tag
>> * Addressed Michael's comments:
>> * Added explanation of all offers delivered message in comments
>> * Removed infinite wait for offers logic, and changed it wait once.
>> * Removed sub channel workqueue flush logic
>> * Added comments on why MLX device offer is not expected as part of
>> this essential boot offer list. I refrained from adding too many details on it as it felt like it is beyond the scope of this patch series and may not be relevant to this. However, please let me know if
>> something needs to be added.
>> * Addressed Saurabh's comments:
>> * Changed timeout value to 10000 ms instead of 10*10000
>> * Changed commit msg as per suggestions
>> * Added a comment for warning case of wait_for_completion timeout
>> ---
>> drivers/hv/channel_mgmt.c | 55 ++++++++++++++++++++++++++++-----------
>> drivers/hv/connection.c | 4 +--
>> drivers/hv/hyperv_vmbus.h | 14 +++-------
>> drivers/hv/vmbus_drv.c | 16 ------------
>> 4 files changed, 45 insertions(+), 44 deletions(-)
>>
>> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> index 3c6011a48dab..a2e9ebe5bf72 100644
>> --- a/drivers/hv/channel_mgmt.c
>> +++ b/drivers/hv/channel_mgmt.c
>> @@ -944,16 +944,6 @@ void vmbus_initiate_unload(bool crash)
>> vmbus_wait_for_unload();
>> }
>>
>> -static void check_ready_for_resume_event(void)
>> -{
>> - /*
>> - * If all the old primary channels have been fixed up, then it's safe
>> - * to resume.
>> - */
>> - if (atomic_dec_and_test(&vmbus_connection.nr_chan_fixup_on_resume))
>> - complete(&vmbus_connection.ready_for_resume_event);
>> -}
>> -
>> static void vmbus_setup_channel_state(struct vmbus_channel *channel,
>> struct vmbus_channel_offer_channel *offer)
>> {
>> @@ -1109,8 +1099,6 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
>>
>> /* Add the channel back to the array of channels. */
>> vmbus_channel_map_relid(oldchannel);
>> - check_ready_for_resume_event();
>> -
>> mutex_unlock(&vmbus_connection.channel_mutex);
>> return;
>> }
>> @@ -1296,13 +1284,22 @@ EXPORT_SYMBOL_GPL(vmbus_hvsock_device_unregister);
>>
>> /*
>> * vmbus_onoffers_delivered -
>> - * This is invoked when all offers have been delivered.
>> + * CHANNELMSG_ALLOFFERS_DELIVERED message arrives after all the essential
>> + * boot-time offers are delivered. Other channels can be hot added
>> + * or removed later, even immediately after the all-offers-delivered
>> + * message. A boot-time offer will be any of the virtual hardware the
>> + * VM is configured with at boot.
>> *
>> - * Nothing to do here.
>> + * Virtual devices like Mellanox NIC may not be included in the list of
>> + * these initial boot offers because it is an optional accelerator to
>> + * the synthetic VMBus NIC. It is hot added only after the VMBus NIC
>> + * channel is opened (once it knows the guest can support it, via the
>> + * sriov bit in the netvsc protocol).
>> */
>> static void vmbus_onoffers_delivered(
>> struct vmbus_channel_message_header *hdr)
>> {
>> + complete(&vmbus_connection.all_offers_delivered_event);
>> }
>>
>> /*
>> @@ -1578,7 +1575,8 @@ void vmbus_onmessage(struct vmbus_channel_message_header *hdr)
>> }
>>
>> /*
>> - * vmbus_request_offers - Send a request to get all our pending offers.
>> + * vmbus_request_offers - Send a request to get all our pending offers
>> + * and wait for all offers to arrive.
>> */
>> int vmbus_request_offers(void)
>> {
>> @@ -1596,6 +1594,10 @@ int vmbus_request_offers(void)
>>
>> msg->msgtype = CHANNELMSG_REQUESTOFFERS;
>>
>> + /*
>> + * This REQUESTOFFERS message will result in the host sending an all
>> + * offers delivered message.
>> + */
>> ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_message_header),
>> true);
>>
>> @@ -1607,6 +1609,29 @@ int vmbus_request_offers(void)
>> goto cleanup;
>> }
>>
>> + /*
>> + * Wait for the host to send all offers.
>> + * Keeping it as a best-effort mechanism, where a warning is
>> + * printed if a timeout occurs, and execution is resumed.
>> + */
>> + if (!wait_for_completion_timeout(
>> + &vmbus_connection.all_offers_delivered_event, msecs_to_jiffies(10000))) {
>> + pr_warn("timed out waiting for all offers to be delivered...\n");
>> + }
>
> secs_to_jiffies() has been merged [1] so please update this to a call to
> secs_to_jiffies(10). A Cocinelle script will probably take care of it
> sooner or later in any case.
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=b35108a51cf7bab58d7eace1267d7965978bcdb8
>
Thank you. I will update it.
Regards,
Naman
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] Drivers: hv: vmbus: Wait for offers during boot
2024-10-31 19:13 ` Michael Kelley
@ 2024-11-05 5:40 ` Naman Jain
0 siblings, 0 replies; 14+ messages in thread
From: Naman Jain @ 2024-11-05 5:40 UTC (permalink / raw)
To: Michael Kelley, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan,
Saurabh Singh Sengar
On 11/1/2024 12:43 AM, Michael Kelley wrote:
> From: Naman Jain <namjain@linux.microsoft.com> Sent: Tuesday, October 29, 2024 1:02 AM
>>
>> Channel offers are requested during VMBus initialization and resume
>> from hibernation. Add support to wait for all channel offers to be
>
> Given the definition of ALLOFFERS_DELIVERED, maybe change to:
>
> " wait for all boot-time channel offers"
Thanks for reviewing, acked.
>
>> delivered and processed before returning from vmbus_request_offers.
>>
>> This is in analogy to a PCI bus not returning from probe until it has
>> scanned all devices on the bus.
>>
>> Without this, user mode can race with VMBus initialization and miss
>> channel offers. User mode has no way to work around this other than
>> sleeping for a while, since there is no way to know when VMBus has
>> finished processing offers.
>
> "finished processing boot-time offers."
>
Acked.
>>
>> With this added functionality, remove earlier logic which keeps track
>> of count of offered channels post resume from hibernation. Once all
>> offers delivered message is received, no further offers are going to
>> be received.
>
> "no further boot-time offers"
Acked.
>
>> Consequently, logic to prevent suspend from happening
>> after previous resume had missing offers, is also removed.
>>
>> Co-developed-by: John Starks <jostarks@microsoft.com>
>> Signed-off-by: John Starks <jostarks@microsoft.com>
>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
>> ---
>> Changes since v1:
>> https://lore.kernel.org/all/20241018115811.5530-1-namjain@linux.microsoft.com/
>> * Added Easwar's Reviewed-By tag
>> * Addressed Michael's comments:
>> * Added explanation of all offers delivered message in comments
>> * Removed infinite wait for offers logic, and changed it wait once.
>> * Removed sub channel workqueue flush logic
>> * Added comments on why MLX device offer is not expected as part of
>> this essential boot offer list. I refrained from adding too many
>
> You've used slightly different phrasing here ("essential boot offer list").
> Potential confusion can be avoided if the terminology is super consistent.
> I'm good with "boot-time offers" (or something else if you prefer). I'm not
> sure what "essential" means here, unless it refers to offers for VFs from
> SR-IOV NICs (like Mellanox) being excluded. But it should be fine to
> use something short like "boot-time offers" and then note the VF
> exception in the code comments as you've done.
By essential, I meant the ones that the VM is configured with.
I'll stick with "boot-time offers".
>
>> details on it as it felt like it is beyond the scope of this patch
>> series and may not be relevant to this. However, please let me know if
>> something needs to be added.
>> * Addressed Saurabh's comments:
>> * Changed timeout value to 10000 ms instead of 10*10000
>> * Changed commit msg as per suggestions
>> * Added a comment for warning case of wait_for_completion timeout
>> ---
>> drivers/hv/channel_mgmt.c | 55 ++++++++++++++++++++++++++++-----------
>> drivers/hv/connection.c | 4 +--
>> drivers/hv/hyperv_vmbus.h | 14 +++-------
>> drivers/hv/vmbus_drv.c | 16 ------------
>> 4 files changed, 45 insertions(+), 44 deletions(-)
>>
>> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> index 3c6011a48dab..a2e9ebe5bf72 100644
>> --- a/drivers/hv/channel_mgmt.c
>> +++ b/drivers/hv/channel_mgmt.c
>> @@ -944,16 +944,6 @@ void vmbus_initiate_unload(bool crash)
>> vmbus_wait_for_unload();
>> }
>>
>> -static void check_ready_for_resume_event(void)
>> -{
>> - /*
>> - * If all the old primary channels have been fixed up, then it's safe
>> - * to resume.
>> - */
>> - if (atomic_dec_and_test(&vmbus_connection.nr_chan_fixup_on_resume))
>> - complete(&vmbus_connection.ready_for_resume_event);
>> -}
>> -
>> static void vmbus_setup_channel_state(struct vmbus_channel *channel,
>> struct vmbus_channel_offer_channel *offer)
>> {
>> @@ -1109,8 +1099,6 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
>>
>> /* Add the channel back to the array of channels. */
>> vmbus_channel_map_relid(oldchannel);
>> - check_ready_for_resume_event();
>> -
>> mutex_unlock(&vmbus_connection.channel_mutex);
>> return;
>> }
>> @@ -1296,13 +1284,22 @@
>> EXPORT_SYMBOL_GPL(vmbus_hvsock_device_unregister);
>>
>> /*
>> * vmbus_onoffers_delivered -
>> - * This is invoked when all offers have been delivered.
>> + * CHANNELMSG_ALLOFFERS_DELIVERED message arrives after all the essential
>
> Again, I would drop "essential" here, as its meaning in this context isn't
> defined anywhere.
>
Acked.
>> + * boot-time offers are delivered. Other channels can be hot added
>> + * or removed later, even immediately after the all-offers-delivered
>> + * message. A boot-time offer will be any of the virtual hardware the
>> + * VM is configured with at boot.
>
> This is good to have a definition of "boot-time offer". But I think some
> additional specification is appropriate. Let me suggest the following:
>
> The CHANNELMSG_ALLOFFERS_DELIVERED message arrives after all
> boot-time offers are delivered. A boot-time offer is for the primary channel
> for any virtual hardware configured in the VM at the time it boots.
> Boot-time offers include offers for physical devices assigned to the VM
> via Hyper-V's Discrete Device Assignment (DDA) functionality that are
> handled as virtual PCI devices in Linux (e.g., NVMe devices and GPUs).
> Boot-time offers do not include offers for VMBus sub-channels. Because
> devices can be hot-added to the VM after it is booted, additional channel
> offers that aren't boot-time offers can be received at any time after the
> all-offers-delivered message.
>
> SR-IOV NIC Virtual Functions (VFs) assigned to a VM are not considered
> to be assigned to the VM at boot-time, and offers for VFs may occur after
> the all-offers-delivered message. VFs are optional accelerators to the
> synthetic VMBus NIC and are effectively hot-added only after the VMBus
> NIC channel is opened (once it knows the guest can support it, via the
> sriov bit in the netvsc protocol).
>
> [Please confirm that my suggested wording is correct!]
This explains really well. I'll rephrase the code comments. Thanks.
>
>> */
>> static void vmbus_onoffers_delivered(
>> struct vmbus_channel_message_header *hdr)
>> {
>> + complete(&vmbus_connection.all_offers_delivered_event);
>> }
>>
>> /*
>> @@ -1578,7 +1575,8 @@ void vmbus_onmessage(struct vmbus_channel_message_header *hdr)
>> }
>>
>> /*
>> - * vmbus_request_offers - Send a request to get all our pending offers.
>> + * vmbus_request_offers - Send a request to get all our pending offers
>> + * and wait for all offers to arrive.
>> */
>> int vmbus_request_offers(void)
>> {
>> @@ -1596,6 +1594,10 @@ int vmbus_request_offers(void)
>>
>> msg->msgtype = CHANNELMSG_REQUESTOFFERS;
>>
>> + /*
>> + * This REQUESTOFFERS message will result in the host sending an all
>> + * offers delivered message.
>> + */
>> ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_message_header),
>> true);
>>
>> @@ -1607,6 +1609,29 @@ int vmbus_request_offers(void)
>> goto cleanup;
>> }
>>
>> + /*
>> + * Wait for the host to send all offers.
>> + * Keeping it as a best-effort mechanism, where a warning is
>> + * printed if a timeout occurs, and execution is resumed.
>> + */
>> + if (!wait_for_completion_timeout(
>> + &vmbus_connection.all_offers_delivered_event, msecs_to_jiffies(10000))) {
>> + pr_warn("timed out waiting for all offers to be delivered...\n");
>> + }
>> +
>> + /*
>> + * Flush handling of offer messages (which may initiate work on
>> + * other work queues).
>> + */
>> + flush_workqueue(vmbus_connection.work_queue);
>> +
>> + /*
>> + * Flush workqueues for processing the incoming offers. Subchannel
>
> s/workqueues/workqueue/
Acked.
>
>> + * offers and processing can happen later, so there is no need to
>> + * flush those workqueues here.
>
> s/those workqueues/that workqueue/
Acked.
>
>> + */
>> + flush_workqueue(vmbus_connection.handle_primary_chan_wq);
>> +
>> cleanup:
>> kfree(msginfo);
>>
>> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
>> index f001ae880e1d..8351360bba16 100644
>> --- a/drivers/hv/connection.c
>> +++ b/drivers/hv/connection.c
>> @@ -34,8 +34,8 @@ struct vmbus_connection vmbus_connection = {
>>
>> .ready_for_suspend_event = COMPLETION_INITIALIZER(
>> vmbus_connection.ready_for_suspend_event),
>> - .ready_for_resume_event = COMPLETION_INITIALIZER(
>> - vmbus_connection.ready_for_resume_event),
>> + .all_offers_delivered_event = COMPLETION_INITIALIZER(
>> + vmbus_connection.all_offers_delivered_event),
>> };
>> EXPORT_SYMBOL_GPL(vmbus_connection);
>>
>> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
>> index d2856023d53c..80cc65dac740 100644
>> --- a/drivers/hv/hyperv_vmbus.h
>> +++ b/drivers/hv/hyperv_vmbus.h
>> @@ -287,18 +287,10 @@ struct vmbus_connection {
>> struct completion ready_for_suspend_event;
>>
>> /*
>> - * The number of primary channels that should be "fixed up"
>> - * upon resume: these channels are re-offered upon resume, and some
>> - * fields of the channel offers (i.e. child_relid and connection_id)
>> - * can change, so the old offermsg must be fixed up, before the resume
>> - * callbacks of the VSC drivers start to further touch the channels.
>> + * Completed once the host has offered all channels. Note that
>
> "all boot-time channels"
Acked.
>
>
>> + * some channels may still be being process on a work queue.
>> */
>> - atomic_t nr_chan_fixup_on_resume;
>> - /*
>> - * vmbus_bus_resume() waits for "nr_chan_fixup_on_resume" to
>> - * drop to zero.
>> - */
>> - struct completion ready_for_resume_event;
>> + struct completion all_offers_delivered_event;
>> };
>>
>>
>> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
>> index 9b15f7daf505..bd3fc41dc06b 100644
>> --- a/drivers/hv/vmbus_drv.c
>> +++ b/drivers/hv/vmbus_drv.c
>> @@ -2427,11 +2427,6 @@ static int vmbus_bus_suspend(struct device *dev)
>> if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
>> wait_for_completion(&vmbus_connection.ready_for_suspend_event);
>>
>> - if (atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0) {
>> - pr_err("Can not suspend due to a previous failed resuming\n");
>> - return -EBUSY;
>> - }
>> -
>> mutex_lock(&vmbus_connection.channel_mutex);
>>
>> list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
>> @@ -2456,17 +2451,12 @@ static int vmbus_bus_suspend(struct device *dev)
>> pr_err("Sub-channel not deleted!\n");
>> WARN_ON_ONCE(1);
>> }
>> -
>> - atomic_inc(&vmbus_connection.nr_chan_fixup_on_resume);
>> }
>>
>> mutex_unlock(&vmbus_connection.channel_mutex);
>>
>> vmbus_initiate_unload(false);
>>
>> - /* Reset the event for the next resume. */
>> - reinit_completion(&vmbus_connection.ready_for_resume_event);
>> -
>> return 0;
>> }
>>
>> @@ -2502,14 +2492,8 @@ static int vmbus_bus_resume(struct device *dev)
>> if (ret != 0)
>> return ret;
>>
>> - WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) == 0);
>> -
>> vmbus_request_offers();
>>
>> - if (wait_for_completion_timeout(
>> - &vmbus_connection.ready_for_resume_event, 10 * HZ) == 0)
>> - pr_err("Some vmbus device is missing after suspending?\n");
>> -
>> /* Reset the event for the next suspend. */
>> reinit_completion(&vmbus_connection.ready_for_suspend_event);
>>
>> --
>> 2.34.1
Thanks,
Naman
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/2] Drivers: hv: vmbus: Log on missing offers
2024-10-31 19:14 ` Michael Kelley
@ 2024-11-07 5:44 ` Naman Jain
2024-11-11 5:43 ` Naman Jain
0 siblings, 1 reply; 14+ messages in thread
From: Naman Jain @ 2024-11-07 5:44 UTC (permalink / raw)
To: Michael Kelley, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan,
Saurabh Singh Sengar
On 11/1/2024 12:44 AM, Michael Kelley wrote:
> From: Naman Jain <namjain@linux.microsoft.com> Sent: Tuesday, October 29, 2024 1:02 AM
>>
>> When resuming from hibernation, log any channels that were present
>> before hibernation but now are gone.
>> In general, the essential virtual devices configured for a VM, remain
>> same, before and after the hibernation and its not very common that
>> some offers are missing.
>
> The wording here is a bit jumbled. And let's use consistent terminology.
> I'd suggest:
>
> In general, the boot-time devices configured for a resuming VM should be
> the same as the devices in the VM at the time of hibernation. It's uncommon
> for the configuration to have been changed such that offers are missing.
> Changing the configuration violates the rules for hibernation anyway.
Sure, I'll make the required changes.
>
>> The cleanup of missing channels is not
>> straight-forward and dependent on individual device driver
>> functionality and implementation, so it can be added in future as
>> separate changes.
>>
>> Signed-off-by: John Starks <jostarks@microsoft.com>
>> Co-developed-by: Naman Jain <namjain@linux.microsoft.com>
>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
>> ---
>> Changes since v1:
>> https://lore.kernel.org/all/20241018115811.5530-1-namjain@linux.microsoft.com/
>> * Added Easwar's Reviewed-By tag
>> * Addressed Saurabh's comments:
>> * Added a note for missing channel cleanup in comments and commit msg
>> ---
>> drivers/hv/vmbus_drv.c | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>>
>> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
>> index bd3fc41dc06b..08214f28694a 100644
>> --- a/drivers/hv/vmbus_drv.c
>> +++ b/drivers/hv/vmbus_drv.c
>> @@ -2462,6 +2462,7 @@ static int vmbus_bus_suspend(struct device *dev)
>>
>> static int vmbus_bus_resume(struct device *dev)
>> {
>> + struct vmbus_channel *channel;
>> struct vmbus_channel_msginfo *msginfo;
>> size_t msgsize;
>> int ret;
>> @@ -2494,6 +2495,22 @@ static int vmbus_bus_resume(struct device *dev)
>>
>> vmbus_request_offers();
>>
>> + mutex_lock(&vmbus_connection.channel_mutex);
>> + list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
>> + if (channel->offermsg.child_relid != INVALID_RELID)
>> + continue;
>> +
>> + /* hvsock channels are not expected to be present. */
>> + if (is_hvsock_channel(channel))
>> + continue;
>> +
>> + pr_err("channel %pUl/%pUl not present after resume.\n",
>> + &channel->offermsg.offer.if_type,
>> + &channel->offermsg.offer.if_instance);
>> + /* ToDo: Cleanup these channels here */
>> + }
>> + mutex_unlock(&vmbus_connection.channel_mutex);
>> +
>
> Dexuan and John have explained how in Azure VMs, there should not be
> any VFs assigned to the VM at the time of hibernation. So the above
> check for missing offers does not trigger an error message due to
> VF offers coming after the all-offers-received message.
>
> But what about the case of a VM running on a local Hyper-V? I'm not
> completely clear, but in that case I don't think any VFs are removed
> before the hibernation, especially for VM-initiated hibernation. It's
I am not sure about this behavior. I have requested Dexuan offline
for a comment.
> a reasonable scenario to later resume that same VM, with the same
> VF assigned to the VM. Because of the way current code counts
> the offers, vmbus_bus_resume() waits for the VF to be offered again,
> and all the channels get correct post-resume relids. But the changes
> in this patch set break that scenario. Since vmbus_bus_resume() now
> proceeds before the VF offer arrives, hv_pci_resume() calling
> vmbus_open() could use the pre-hibernation relid for the VF and break
> things. Certainly the "not present after resume" error message would
> be spurious.
>
> Maybe the focus here is Azure, and it's tolerable for the local Hyper-V
> case with a VF to not work pending later fixes. But I thought I'd call
> out the potential issue (assuming my thinking is correct).
>
> Michael
IIUC, below scenarios can happen based on your comment-
Case 1:
VF channel offer is received in time before hv_pci_resume() and there
are no problems.
Case 2:
Resume proceeds just after getting ALLOFFERS_DELIVERED msg and a warning
is printed that this VF channel is not present after resume.
Then two scenarios can happen:
Case 2.1:
VF channel offer is received before hv_pci_resume() and things work
fine. Warning printed is spurious.
Case 2.2:
VM channel offer is not received before hv_pci_resume() and relid is
not yet restored by onoffer. This is a problem. Warning is printed in
this case for missing offer.
I think it all depends on whether or not VFs are removed in local
HyperV VMs. I'll try to get this information. Thanks for pointing this
out.
Regards,
Naman
>
>> /* Reset the event for the next suspend. */
>> reinit_completion(&vmbus_connection.ready_for_suspend_event);
>>
>> --
>> 2.34.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/2] Drivers: hv: vmbus: Log on missing offers
2024-11-07 5:44 ` Naman Jain
@ 2024-11-11 5:43 ` Naman Jain
2024-11-12 3:13 ` Michael Kelley
0 siblings, 1 reply; 14+ messages in thread
From: Naman Jain @ 2024-11-11 5:43 UTC (permalink / raw)
To: Michael Kelley, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan,
Saurabh Singh Sengar
On 11/7/2024 11:14 AM, Naman Jain wrote:
>
>
> On 11/1/2024 12:44 AM, Michael Kelley wrote:
>> From: Naman Jain <namjain@linux.microsoft.com> Sent: Tuesday, October
>> 29, 2024 1:02 AM
>>>
>>> When resuming from hibernation, log any channels that were present
>>> before hibernation but now are gone.
>>> In general, the essential virtual devices configured for a VM, remain
>>> same, before and after the hibernation and its not very common that
>>> some offers are missing.
>>
>> The wording here is a bit jumbled. And let's use consistent terminology.
>> I'd suggest:
>>
>> In general, the boot-time devices configured for a resuming VM should be
>> the same as the devices in the VM at the time of hibernation. It's
>> uncommon
>> for the configuration to have been changed such that offers are missing.
>> Changing the configuration violates the rules for hibernation anyway.
>
> Sure, I'll make the required changes.
>
>>
>>> The cleanup of missing channels is not
>>> straight-forward and dependent on individual device driver
>>> functionality and implementation, so it can be added in future as
>>> separate changes.
>>>
>>> Signed-off-by: John Starks <jostarks@microsoft.com>
>>> Co-developed-by: Naman Jain <namjain@linux.microsoft.com>
>>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>>> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
>>> ---
>>> Changes since v1:
>>> https://lore.kernel.org/all/20241018115811.5530-1-
>>> namjain@linux.microsoft.com/
>>> * Added Easwar's Reviewed-By tag
>>> * Addressed Saurabh's comments:
>>> * Added a note for missing channel cleanup in comments and commit msg
>>> ---
>>> drivers/hv/vmbus_drv.c | 17 +++++++++++++++++
>>> 1 file changed, 17 insertions(+)
>>>
>>> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
>>> index bd3fc41dc06b..08214f28694a 100644
>>> --- a/drivers/hv/vmbus_drv.c
>>> +++ b/drivers/hv/vmbus_drv.c
>>> @@ -2462,6 +2462,7 @@ static int vmbus_bus_suspend(struct device *dev)
>>>
>>> static int vmbus_bus_resume(struct device *dev)
>>> {
>>> + struct vmbus_channel *channel;
>>> struct vmbus_channel_msginfo *msginfo;
>>> size_t msgsize;
>>> int ret;
>>> @@ -2494,6 +2495,22 @@ static int vmbus_bus_resume(struct device *dev)
>>>
>>> vmbus_request_offers();
>>>
>>> + mutex_lock(&vmbus_connection.channel_mutex);
>>> + list_for_each_entry(channel, &vmbus_connection.chn_list,
>>> listentry) {
>>> + if (channel->offermsg.child_relid != INVALID_RELID)
>>> + continue;
>>> +
>>> + /* hvsock channels are not expected to be present. */
>>> + if (is_hvsock_channel(channel))
>>> + continue;
>>> +
>>> + pr_err("channel %pUl/%pUl not present after resume.\n",
>>> + &channel->offermsg.offer.if_type,
>>> + &channel->offermsg.offer.if_instance);
>>> + /* ToDo: Cleanup these channels here */
>>> + }
>>> + mutex_unlock(&vmbus_connection.channel_mutex);
>>> +
>>
>> Dexuan and John have explained how in Azure VMs, there should not be
>> any VFs assigned to the VM at the time of hibernation. So the above
>> check for missing offers does not trigger an error message due to
>> VF offers coming after the all-offers-received message.
>>
>> But what about the case of a VM running on a local Hyper-V? I'm not
>> completely clear, but in that case I don't think any VFs are removed
>> before the hibernation, especially for VM-initiated hibernation. It's
>
> I am not sure about this behavior. I have requested Dexuan offline
> for a comment.
>
>> a reasonable scenario to later resume that same VM, with the same
>> VF assigned to the VM. Because of the way current code counts
>> the offers, vmbus_bus_resume() waits for the VF to be offered again,
>> and all the channels get correct post-resume relids. But the changes
>> in this patch set break that scenario. Since vmbus_bus_resume() now
>> proceeds before the VF offer arrives, hv_pci_resume() calling
>> vmbus_open() could use the pre-hibernation relid for the VF and break
>> things. Certainly the "not present after resume" error message would
>> be spurious.
>>
>> Maybe the focus here is Azure, and it's tolerable for the local Hyper-V
>> case with a VF to not work pending later fixes. But I thought I'd call
>> out the potential issue (assuming my thinking is correct).
>>
>> Michael
>
> IIUC, below scenarios can happen based on your comment-
>
> Case 1:
> VF channel offer is received in time before hv_pci_resume() and there
> are no problems.
>
> Case 2:
> Resume proceeds just after getting ALLOFFERS_DELIVERED msg and a warning
> is printed that this VF channel is not present after resume.
> Then two scenarios can happen:
> Case 2.1:
> VF channel offer is received before hv_pci_resume() and things work
> fine. Warning printed is spurious.
> Case 2.2:
> VM channel offer is not received before hv_pci_resume() and relid is
> not yet restored by onoffer. This is a problem. Warning is printed in
> this case for missing offer.
>
> I think it all depends on whether or not VFs are removed in local
> HyperV VMs. I'll try to get this information. Thanks for pointing this
> out.
>
> Regards,
> Naman
>
Hi Michael,
I discussed with Dexuan and we tried these scenarios. Here are the
observations:
For the two ways of host initiated hibernation:
#1: Invoke-Hibernate $vm -Device (Uses the guest shutdown component)
OR
#2: Invoke-Hibernate $vm -ComputerSystem (Uses the RequestStateChange
ability)
#1 does not remove the VF before sending the hibernate message to the VM
via hv_utils, but #2 does.
With both #1 and #2, during resume, the host offers the vPCI vmbus
device before the vmbus_onoffers_delivered() is called. Whether or not
VFs are removed doesn't matter here, because during resume the first
fresh kernel always requests the VF device, meaning it has become a
boot-time device when the 'old' kernel is resuming back. So the issue we
are discussing will not happen in practice and the patch won't break
things and won't print spurious warnings. If its OK, please let me know,
I'll then proceed with v3.
Thanks,
Naman
>
>>
>>> /* Reset the event for the next suspend. */
>>> reinit_completion(&vmbus_connection.ready_for_suspend_event);
>>>
>>> --
>>> 2.34.1
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH v2 2/2] Drivers: hv: vmbus: Log on missing offers
2024-11-11 5:43 ` Naman Jain
@ 2024-11-12 3:13 ` Michael Kelley
2024-11-13 8:47 ` Naman Jain
0 siblings, 1 reply; 14+ messages in thread
From: Michael Kelley @ 2024-11-12 3:13 UTC (permalink / raw)
To: Naman Jain, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan,
Saurabh Singh Sengar
From: Naman Jain <namjain@linux.microsoft.com> Sent: Sunday, November 10, 2024 9:44 PM
>
> On 11/7/2024 11:14 AM, Naman Jain wrote:
> >
> > On 11/1/2024 12:44 AM, Michael Kelley wrote:
> >> From: Naman Jain <namjain@linux.microsoft.com> Sent: Tuesday, October 29, 2024 1:02 AM
> >>>
[snip]
> >>> @@ -2494,6 +2495,22 @@ static int vmbus_bus_resume(struct device *dev)
> >>>
> >>> vmbus_request_offers();
> >>>
> >>> + mutex_lock(&vmbus_connection.channel_mutex);
> >>> + list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
> >>> + if (channel->offermsg.child_relid != INVALID_RELID)
> >>> + continue;
> >>> +
> >>> + /* hvsock channels are not expected to be present. */
> >>> + if (is_hvsock_channel(channel))
> >>> + continue;
> >>> +
> >>> + pr_err("channel %pUl/%pUl not present after resume.\n",
> >>> + &channel->offermsg.offer.if_type,
> >>> + &channel->offermsg.offer.if_instance);
> >>> + /* ToDo: Cleanup these channels here */
> >>> + }
> >>> + mutex_unlock(&vmbus_connection.channel_mutex);
> >>> +
> >>
> >> Dexuan and John have explained how in Azure VMs, there should not be
> >> any VFs assigned to the VM at the time of hibernation. So the above
> >> check for missing offers does not trigger an error message due to
> >> VF offers coming after the all-offers-received message.
> >>
> >> But what about the case of a VM running on a local Hyper-V? I'm not
> >> completely clear, but in that case I don't think any VFs are removed
> >> before the hibernation, especially for VM-initiated hibernation. It's
> >
> > I am not sure about this behavior. I have requested Dexuan offline
> > for a comment.
> >
> >> a reasonable scenario to later resume that same VM, with the same
> >> VF assigned to the VM. Because of the way current code counts
> >> the offers, vmbus_bus_resume() waits for the VF to be offered again,
> >> and all the channels get correct post-resume relids. But the changes
> >> in this patch set break that scenario. Since vmbus_bus_resume() now
> >> proceeds before the VF offer arrives, hv_pci_resume() calling
> >> vmbus_open() could use the pre-hibernation relid for the VF and break
> >> things. Certainly the "not present after resume" error message would
> >> be spurious.
> >>
> >> Maybe the focus here is Azure, and it's tolerable for the local Hyper-V
> >> case with a VF to not work pending later fixes. But I thought I'd call
> >> out the potential issue (assuming my thinking is correct).
> >>
> >> Michael
> >
> > IIUC, below scenarios can happen based on your comment-
> >
> > Case 1:
> > VF channel offer is received in time before hv_pci_resume() and there
> > are no problems.
> >
> > Case 2:
> > Resume proceeds just after getting ALLOFFERS_DELIVERED msg and a warning
> > is printed that this VF channel is not present after resume.
> > Then two scenarios can happen:
> > Case 2.1:
> > VF channel offer is received before hv_pci_resume() and things work
> > fine. Warning printed is spurious.
> > Case 2.2:
> > VM channel offer is not received before hv_pci_resume() and relid is
> > not yet restored by onoffer. This is a problem. Warning is printed in
> > this case for missing offer.
> >
> > I think it all depends on whether or not VFs are removed in local
> > HyperV VMs. I'll try to get this information. Thanks for pointing this
> > out.
> >
> > Regards,
> > Naman
> >
>
> Hi Michael,
> I discussed with Dexuan and we tried these scenarios. Here are the
> observations:
>
> For the two ways of host initiated hibernation:
>
> #1: Invoke-Hibernate $vm -Device (Uses the guest shutdown component)
> OR
> #2: Invoke-Hibernate $vm -ComputerSystem (Uses the RequestStateChange
> ability)
Question: What Powershell module provides "Invoke-Hibernate"? It's not
present on my Windows 11 system that is running Hyper-V, and I can't
find any documentation about it on the web. Or maybe Invoke-Hibernate
is a Powershell *script*?
>
> #1 does not remove the VF before sending the hibernate message to the VM
> via hv_utils, but #2 does.
>
> With both #1 and #2, during resume, the host offers the vPCI vmbus
> device before the vmbus_onoffers_delivered() is called. Whether or not
> VFs are removed doesn't matter here, because during resume the first
> fresh kernel always requests the VF device, meaning it has become a
> boot-time device when the 'old' kernel is resuming back. So the issue we
> are discussing will not happen in practice and the patch won't break
> things and won't print spurious warnings. If its OK, please let me know,
> I'll then proceed with v3.
>
Ah, this is interesting. I'm assuming these are the details:
1) VM boots with the intent of resuming from hibernation (though
Hyper-V doesn't know about that intent)
2) Original fresh kernel is loaded and begins initialization
3) VMBus offers come in for boot-time devices, which excludes SR-IOV VFs.
4) ALLOFFERS_DELIVERED message comes in
5) The storvsc driver initializes for the virtual disks on the VM
6) Kernel initialization code finds and reads the swap space to see if a
hibernation image is present. If so, it reads in the hibernation image.
7) The suspend sequence is initiated (just like during hibernation)
to shutdown the VMBus devices and terminate the VMBus connection.
8) Control is transferred to the previously read-in hibernation image
9) The hibernation image runs the resume sequence, which
initiates a new VMBus connection and requests offers
10) VMBus offers come in for whatever VMBus devices were present
when Step 7 initiated the suspend sequence. If a VF device was present
at that time, an offer for that VF device will come in and will match up
with the VF that was present in the VM at the time of hibernation.
11) ALLOFFERS_DELIVERED message comes in again for the
newly initiated VMBus connection.
The netvsc driver gets initialized *after* step 4, but we don't know
exactly *when* relative to the storvsc driver. The netvsc driver must
tell Hyper-V that it can handle an SR-IOV VF, and the VF offer is sent
sometime after that. While this netvsc/VF sequence is happening, the
storvsc driver is reading the hibernation image from swap (Step 6).
I think the sequence you describe works when reading the
hibernation image from swap takes 10's of seconds, or even several
minutes in an Azure VM with a remote disk. That gives plenty
of time for the VF to get initialized and be fully present when Step 7
starts. But there's no *guarantee* that the VF is initialized by then.
It's also not clear to me what action by the guest causes Hyper-V to
treat the VF as "added to the VM" so that in Step 10 the VF offer is
sent before ALLOFFERS_DELIVERED.
The sequence you describe also happens in an Azure VM, even if
the VF is removed before hibernation. When the VF offer arrives
during Step 10, it doesn't match with any VFs that were in the VM
at the time of hibernation. It's treated as a new device, just like it
would be if the offer arrived after ALLOFFERS_DELIVERED.
But it seems like there's still the risk of having a fast swap disk
and a small hibernation image that can be read in a shorter amount
of time than it takes to initialize the VF to the point that Hyper-V
treats it as added to the VM. Without knowing what that point is,
it's hard to assess the likelihood of that happening. Or maybe there's
an interlock I'm not aware of that ensures Step 7 can't proceed
while the netvsc/VF sequence is in progress.
So maybe it's best to proceed with this patch, and deal with the
risk later when/if it becomes reality. I'm OK if you want to do
that. This has been an interesting discussion that I'll try to capture
in some high-level documentation about how Linux guests on
Hyper-V do hibernation!
Michael
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/2] Drivers: hv: vmbus: Log on missing offers
2024-11-12 3:13 ` Michael Kelley
@ 2024-11-13 8:47 ` Naman Jain
2024-11-13 15:26 ` Michael Kelley
0 siblings, 1 reply; 14+ messages in thread
From: Naman Jain @ 2024-11-13 8:47 UTC (permalink / raw)
To: Michael Kelley, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan,
Saurabh Singh Sengar
On 11/12/2024 8:43 AM, Michael Kelley wrote:
> From: Naman Jain <namjain@linux.microsoft.com> Sent: Sunday, November 10, 2024 9:44 PM
>>
>> On 11/7/2024 11:14 AM, Naman Jain wrote:
>>>
>>> On 11/1/2024 12:44 AM, Michael Kelley wrote:
>>>> From: Naman Jain <namjain@linux.microsoft.com> Sent: Tuesday, October 29, 2024 1:02 AM
>>>>>
>
> [snip]
>
>>>>> @@ -2494,6 +2495,22 @@ static int vmbus_bus_resume(struct device *dev)
>>>>>
>>>>> vmbus_request_offers();
>>>>>
>>>>> + mutex_lock(&vmbus_connection.channel_mutex);
>>>>> + list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
>>>>> + if (channel->offermsg.child_relid != INVALID_RELID)
>>>>> + continue;
>>>>> +
>>>>> + /* hvsock channels are not expected to be present. */
>>>>> + if (is_hvsock_channel(channel))
>>>>> + continue;
>>>>> +
>>>>> + pr_err("channel %pUl/%pUl not present after resume.\n",
>>>>> + &channel->offermsg.offer.if_type,
>>>>> + &channel->offermsg.offer.if_instance);
>>>>> + /* ToDo: Cleanup these channels here */
>>>>> + }
>>>>> + mutex_unlock(&vmbus_connection.channel_mutex);
>>>>> +
>>>>
>>>> Dexuan and John have explained how in Azure VMs, there should not be
>>>> any VFs assigned to the VM at the time of hibernation. So the above
>>>> check for missing offers does not trigger an error message due to
>>>> VF offers coming after the all-offers-received message.
>>>>
>>>> But what about the case of a VM running on a local Hyper-V? I'm not
>>>> completely clear, but in that case I don't think any VFs are removed
>>>> before the hibernation, especially for VM-initiated hibernation. It's
>>>
>>> I am not sure about this behavior. I have requested Dexuan offline
>>> for a comment.
>>>
>>>> a reasonable scenario to later resume that same VM, with the same
>>>> VF assigned to the VM. Because of the way current code counts
>>>> the offers, vmbus_bus_resume() waits for the VF to be offered again,
>>>> and all the channels get correct post-resume relids. But the changes
>>>> in this patch set break that scenario. Since vmbus_bus_resume() now
>>>> proceeds before the VF offer arrives, hv_pci_resume() calling
>>>> vmbus_open() could use the pre-hibernation relid for the VF and break
>>>> things. Certainly the "not present after resume" error message would
>>>> be spurious.
>>>>
>>>> Maybe the focus here is Azure, and it's tolerable for the local Hyper-V
>>>> case with a VF to not work pending later fixes. But I thought I'd call
>>>> out the potential issue (assuming my thinking is correct).
>>>>
>>>> Michael
>>>
>>> IIUC, below scenarios can happen based on your comment-
>>>
>>> Case 1:
>>> VF channel offer is received in time before hv_pci_resume() and there
>>> are no problems.
>>>
>>> Case 2:
>>> Resume proceeds just after getting ALLOFFERS_DELIVERED msg and a warning
>>> is printed that this VF channel is not present after resume.
>>> Then two scenarios can happen:
>>> Case 2.1:
>>> VF channel offer is received before hv_pci_resume() and things work
>>> fine. Warning printed is spurious.
>>> Case 2.2:
>>> VM channel offer is not received before hv_pci_resume() and relid is
>>> not yet restored by onoffer. This is a problem. Warning is printed in
>>> this case for missing offer.
>>>
>>> I think it all depends on whether or not VFs are removed in local
>>> HyperV VMs. I'll try to get this information. Thanks for pointing this
>>> out.
>>>
>>> Regards,
>>> Naman
>>>
>>
>> Hi Michael,
>> I discussed with Dexuan and we tried these scenarios. Here are the
>> observations:
>>
>> For the two ways of host initiated hibernation:
>>
>> #1: Invoke-Hibernate $vm -Device (Uses the guest shutdown component)
>> OR
>> #2: Invoke-Hibernate $vm -ComputerSystem (Uses the RequestStateChange
>> ability)
>
> Question: What Powershell module provides "Invoke-Hibernate"? It's not
> present on my Windows 11 system that is running Hyper-V, and I can't
> find any documentation about it on the web. Or maybe Invoke-Hibernate
> is a Powershell *script*?
Powertest is one of the internal packages, which has some commands to
trigger host-initiated hibernation. I should probably have mentioned
that in my original comment.
>
>>
>> #1 does not remove the VF before sending the hibernate message to the VM
>> via hv_utils, but #2 does.
>>
>> With both #1 and #2, during resume, the host offers the vPCI vmbus
>> device before the vmbus_onoffers_delivered() is called. Whether or not
>> VFs are removed doesn't matter here, because during resume the first
>> fresh kernel always requests the VF device, meaning it has become a
>> boot-time device when the 'old' kernel is resuming back. So the issue we
>> are discussing will not happen in practice and the patch won't break
>> things and won't print spurious warnings. If its OK, please let me know,
>> I'll then proceed with v3.
>>
>
> Ah, this is interesting. I'm assuming these are the details:
>
> 1) VM boots with the intent of resuming from hibernation (though
> Hyper-V doesn't know about that intent)
> 2) Original fresh kernel is loaded and begins initialization
> 3) VMBus offers come in for boot-time devices, which excludes SR-IOV VFs.
> 4) ALLOFFERS_DELIVERED message comes in
> 5) The storvsc driver initializes for the virtual disks on the VM
> 6) Kernel initialization code finds and reads the swap space to see if a
> hibernation image is present. If so, it reads in the hibernation image.
> 7) The suspend sequence is initiated (just like during hibernation)
> to shutdown the VMBus devices and terminate the VMBus connection.
> 8) Control is transferred to the previously read-in hibernation image
> 9) The hibernation image runs the resume sequence, which
> initiates a new VMBus connection and requests offers
> 10) VMBus offers come in for whatever VMBus devices were present
> when Step 7 initiated the suspend sequence. If a VF device was present
> at that time, an offer for that VF device will come in and will match up
> with the VF that was present in the VM at the time of hibernation.
> 11) ALLOFFERS_DELIVERED message comes in again for the
> newly initiated VMBus connection.
>
3), 4) works differently IMO. There is no request_for_offers, or
ALLOFFERS_DELIVERED for fresh kernel. Otherwise on adding the prints in
kernel, we should have seen these function calls *twice* in one
hibernation-resume cycle. But that is not the case.
When the older/original kernel boots up, and requests offers, it gets
those VF offers again as part of boot time offers, and then
ALLOFFERS_DELIVERED msg comes. I'm still trying to figure out how fresh
kernel requests for VF offers or if it gets those offers automatically
from the host. I will update my findings so that it can be put up in
documentation which you mentioned.
> The netvsc driver gets initialized *after* step 4, but we don't know
> exactly *when* relative to the storvsc driver. The netvsc driver must
> tell Hyper-V that it can handle an SR-IOV VF, and the VF offer is sent
> sometime after that. While this netvsc/VF sequence is happening, the
> storvsc driver is reading the hibernation image from swap (Step 6).
>
Maybe this is how fresh kernel gets the offers for VF devices.
> I think the sequence you describe works when reading the
> hibernation image from swap takes 10's of seconds, or even several
> minutes in an Azure VM with a remote disk. That gives plenty
> of time for the VF to get initialized and be fully present when Step 7
> starts. But there's no *guarantee* that the VF is initialized by then.
> It's also not clear to me what action by the guest causes Hyper-V to
> treat the VF as "added to the VM" so that in Step 10 the VF offer is
> sent before ALLOFFERS_DELIVERED.
>
> The sequence you describe also happens in an Azure VM, even if
> the VF is removed before hibernation. When the VF offer arrives
> during Step 10, it doesn't match with any VFs that were in the VM
> at the time of hibernation. It's treated as a new device, just like it
> would be if the offer arrived after ALLOFFERS_DELIVERED.
>
> But it seems like there's still the risk of having a fast swap disk
> and a small hibernation image that can be read in a shorter amount
> of time than it takes to initialize the VF to the point that Hyper-V
> treats it as added to the VM. Without knowing what that point is,
> it's hard to assess the likelihood of that happening. Or maybe there's
> an interlock I'm not aware of that ensures Step 7 can't proceed
> while the netvsc/VF sequence is in progress.
>
> So maybe it's best to proceed with this patch, and deal with the
> risk later when/if it becomes reality. I'm OK if you want to do
> that. This has been an interesting discussion that I'll try to capture
> in some high-level documentation about how Linux guests on
> Hyper-V do hibernation!
>
> Michael
I have sent v3 with the changes we discussed.
Regards,
Naman
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH v2 2/2] Drivers: hv: vmbus: Log on missing offers
2024-11-13 8:47 ` Naman Jain
@ 2024-11-13 15:26 ` Michael Kelley
2024-11-14 6:53 ` Naman Jain
0 siblings, 1 reply; 14+ messages in thread
From: Michael Kelley @ 2024-11-13 15:26 UTC (permalink / raw)
To: Naman Jain, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan,
Saurabh Singh Sengar
From: Naman Jain <namjain@linux.microsoft.com> Sent: Wednesday, November 13, 2024 12:47 AM
>
> On 11/12/2024 8:43 AM, Michael Kelley wrote:
> > From: Naman Jain <namjain@linux.microsoft.com> Sent: Sunday, November 10, 2024 9:44 PM
> >>
> >> On 11/7/2024 11:14 AM, Naman Jain wrote:
> >>>
> >>> On 11/1/2024 12:44 AM, Michael Kelley wrote:
> >>>> From: Naman Jain <namjain@linux.microsoft.com> Sent: Tuesday, October 29, 2024 1:02 AM
> >>>>>
> >
> > [snip]
> >
> >>>>> @@ -2494,6 +2495,22 @@ static int vmbus_bus_resume(struct device *dev)
> >>>>>
> >>>>> vmbus_request_offers();
> >>>>>
> >>>>> + mutex_lock(&vmbus_connection.channel_mutex);
> >>>>> + list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
> >>>>> + if (channel->offermsg.child_relid != INVALID_RELID)
> >>>>> + continue;
> >>>>> +
> >>>>> + /* hvsock channels are not expected to be present. */
> >>>>> + if (is_hvsock_channel(channel))
> >>>>> + continue;
> >>>>> +
> >>>>> + pr_err("channel %pUl/%pUl not present after resume.\n",
> >>>>> + &channel->offermsg.offer.if_type,
> >>>>> + &channel->offermsg.offer.if_instance);
> >>>>> + /* ToDo: Cleanup these channels here */
> >>>>> + }
> >>>>> + mutex_unlock(&vmbus_connection.channel_mutex);
> >>>>> +
> >>>>
> >>>> Dexuan and John have explained how in Azure VMs, there should not be
> >>>> any VFs assigned to the VM at the time of hibernation. So the above
> >>>> check for missing offers does not trigger an error message due to
> >>>> VF offers coming after the all-offers-received message.
> >>>>
> >>>> But what about the case of a VM running on a local Hyper-V? I'm not
> >>>> completely clear, but in that case I don't think any VFs are removed
> >>>> before the hibernation, especially for VM-initiated hibernation. It's
> >>>
> >>> I am not sure about this behavior. I have requested Dexuan offline
> >>> for a comment.
> >>>
> >>>> a reasonable scenario to later resume that same VM, with the same
> >>>> VF assigned to the VM. Because of the way current code counts
> >>>> the offers, vmbus_bus_resume() waits for the VF to be offered again,
> >>>> and all the channels get correct post-resume relids. But the changes
> >>>> in this patch set break that scenario. Since vmbus_bus_resume() now
> >>>> proceeds before the VF offer arrives, hv_pci_resume() calling
> >>>> vmbus_open() could use the pre-hibernation relid for the VF and break
> >>>> things. Certainly the "not present after resume" error message would
> >>>> be spurious.
> >>>>
> >>>> Maybe the focus here is Azure, and it's tolerable for the local Hyper-V
> >>>> case with a VF to not work pending later fixes. But I thought I'd call
> >>>> out the potential issue (assuming my thinking is correct).
> >>>>
> >>>> Michael
> >>>
> >>> IIUC, below scenarios can happen based on your comment-
> >>>
> >>> Case 1:
> >>> VF channel offer is received in time before hv_pci_resume() and there
> >>> are no problems.
> >>>
> >>> Case 2:
> >>> Resume proceeds just after getting ALLOFFERS_DELIVERED msg and a warning
> >>> is printed that this VF channel is not present after resume.
> >>> Then two scenarios can happen:
> >>> Case 2.1:
> >>> VF channel offer is received before hv_pci_resume() and things work
> >>> fine. Warning printed is spurious.
> >>> Case 2.2:
> >>> VM channel offer is not received before hv_pci_resume() and relid is
> >>> not yet restored by onoffer. This is a problem. Warning is printed in
> >>> this case for missing offer.
> >>>
> >>> I think it all depends on whether or not VFs are removed in local
> >>> HyperV VMs. I'll try to get this information. Thanks for pointing this
> >>> out.
> >>>
> >>> Regards,
> >>> Naman
> >>>
> >>
> >> Hi Michael,
> >> I discussed with Dexuan and we tried these scenarios. Here are the
> >> observations:
> >>
> >> For the two ways of host initiated hibernation:
> >>
> >> #1: Invoke-Hibernate $vm -Device (Uses the guest shutdown component)
> >> OR
> >> #2: Invoke-Hibernate $vm -ComputerSystem (Uses the RequestStateChange
> >> ability)
> >
> > Question: What Powershell module provides "Invoke-Hibernate"? It's not
> > present on my Windows 11 system that is running Hyper-V, and I can't
> > find any documentation about it on the web. Or maybe Invoke-Hibernate
> > is a Powershell *script*?
>
> Powertest is one of the internal packages, which has some commands to
> trigger host-initiated hibernation. I should probably have mentioned
> that in my original comment.
>
Makes sense. Since there was no mention of it on the web, I was
reaching the conclusion that it must be something internal. There are
public postings on how to twiddle power settings to do hibernation,
so hopefully I can create my own equivalent.
> >
> >>
> >> #1 does not remove the VF before sending the hibernate message to the VM
> >> via hv_utils, but #2 does.
> >>
> >> With both #1 and #2, during resume, the host offers the vPCI vmbus
> >> device before the vmbus_onoffers_delivered() is called. Whether or not
> >> VFs are removed doesn't matter here, because during resume the first
> >> fresh kernel always requests the VF device, meaning it has become a
> >> boot-time device when the 'old' kernel is resuming back. So the issue we
> >> are discussing will not happen in practice and the patch won't break
> >> things and won't print spurious warnings. If its OK, please let me know,
> >> I'll then proceed with v3.
> >>
> >
> > Ah, this is interesting. I'm assuming these are the details:
> >
> > 1) VM boots with the intent of resuming from hibernation (though
> > Hyper-V doesn't know about that intent)
> > 2) Original fresh kernel is loaded and begins initialization
> > 3) VMBus offers come in for boot-time devices, which excludes SR-IOV VFs.
> > 4) ALLOFFERS_DELIVERED message comes in
> > 5) The storvsc driver initializes for the virtual disks on the VM
> > 6) Kernel initialization code finds and reads the swap space to see if a
> > hibernation image is present. If so, it reads in the hibernation image.
> > 7) The suspend sequence is initiated (just like during hibernation)
> > to shutdown the VMBus devices and terminate the VMBus connection.
> > 8) Control is transferred to the previously read-in hibernation image
> > 9) The hibernation image runs the resume sequence, which
> > initiates a new VMBus connection and requests offers
> > 10) VMBus offers come in for whatever VMBus devices were present
> > when Step 7 initiated the suspend sequence. If a VF device was present
> > at that time, an offer for that VF device will come in and will match up
> > with the VF that was present in the VM at the time of hibernation.
> > 11) ALLOFFERS_DELIVERED message comes in again for the
> > newly initiated VMBus connection.
> >
>
> 3), 4) works differently IMO. There is no request_for_offers, or
> ALLOFFERS_DELIVERED for fresh kernel. Otherwise on adding the prints in
> kernel, we should have seen these function calls *twice* in one
> hibernation-resume cycle. But that is not the case.
>
> When the older/original kernel boots up, and requests offers, it gets
> those VF offers again as part of boot time offers, and then
> ALLOFFERS_DELIVERED msg comes. I'm still trying to figure out how fresh
> kernel requests for VF offers or if it gets those offers automatically
> from the host. I will update my findings so that it can be put up in
> documentation which you mentioned.
Hmmm. I'm not sure what might be happening. I'll be interested in
what you find. I do indeed want to call out the details in my
documentation. And I'll also try to repro myself.
Michael
>
> > The netvsc driver gets initialized *after* step 4, but we don't know
> > exactly *when* relative to the storvsc driver. The netvsc driver must
> > tell Hyper-V that it can handle an SR-IOV VF, and the VF offer is sent
> > sometime after that. While this netvsc/VF sequence is happening, the
> > storvsc driver is reading the hibernation image from swap (Step 6).
> >
>
> Maybe this is how fresh kernel gets the offers for VF devices.
>
> > I think the sequence you describe works when reading the
> > hibernation image from swap takes 10's of seconds, or even several
> > minutes in an Azure VM with a remote disk. That gives plenty
> > of time for the VF to get initialized and be fully present when Step 7
> > starts. But there's no *guarantee* that the VF is initialized by then.
> > It's also not clear to me what action by the guest causes Hyper-V to
> > treat the VF as "added to the VM" so that in Step 10 the VF offer is
> > sent before ALLOFFERS_DELIVERED.
> >
> > The sequence you describe also happens in an Azure VM, even if
> > the VF is removed before hibernation. When the VF offer arrives
> > during Step 10, it doesn't match with any VFs that were in the VM
> > at the time of hibernation. It's treated as a new device, just like it
> > would be if the offer arrived after ALLOFFERS_DELIVERED.
> >
> > But it seems like there's still the risk of having a fast swap disk
> > and a small hibernation image that can be read in a shorter amount
> > of time than it takes to initialize the VF to the point that Hyper-V
> > treats it as added to the VM. Without knowing what that point is,
> > it's hard to assess the likelihood of that happening. Or maybe there's
> > an interlock I'm not aware of that ensures Step 7 can't proceed
> > while the netvsc/VF sequence is in progress.
> >
> > So maybe it's best to proceed with this patch, and deal with the
> > risk later when/if it becomes reality. I'm OK if you want to do
> > that. This has been an interesting discussion that I'll try to capture
> > in some high-level documentation about how Linux guests on
> > Hyper-V do hibernation!
> >
> > Michael
>
>
>
> I have sent v3 with the changes we discussed.
>
> Regards,
> Naman
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/2] Drivers: hv: vmbus: Log on missing offers
2024-11-13 15:26 ` Michael Kelley
@ 2024-11-14 6:53 ` Naman Jain
0 siblings, 0 replies; 14+ messages in thread
From: Naman Jain @ 2024-11-14 6:53 UTC (permalink / raw)
To: Michael Kelley, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan,
Saurabh Singh Sengar
On 11/13/2024 8:56 PM, Michael Kelley wrote:
> From: Naman Jain <namjain@linux.microsoft.com> Sent: Wednesday, November 13, 2024 12:47 AM
>>
>> On 11/12/2024 8:43 AM, Michael Kelley wrote:
>>> From: Naman Jain <namjain@linux.microsoft.com> Sent: Sunday, November 10, 2024 9:44 PM
>>>>
>>>> On 11/7/2024 11:14 AM, Naman Jain wrote:
>>>>>
>>>>> On 11/1/2024 12:44 AM, Michael Kelley wrote:
>>>>>> From: Naman Jain <namjain@linux.microsoft.com> Sent: Tuesday, October 29, 2024 1:02 AM
>>>>>>>
>>>
>>> [snip]
<snip>
>>>
>>> 1) VM boots with the intent of resuming from hibernation (though
>>> Hyper-V doesn't know about that intent)
>>> 2) Original fresh kernel is loaded and begins initialization
>>> 3) VMBus offers come in for boot-time devices, which excludes SR-IOV VFs.
>>> 4) ALLOFFERS_DELIVERED message comes in
>>> 5) The storvsc driver initializes for the virtual disks on the VM
>>> 6) Kernel initialization code finds and reads the swap space to see if a
>>> hibernation image is present. If so, it reads in the hibernation image.
>>> 7) The suspend sequence is initiated (just like during hibernation)
>>> to shutdown the VMBus devices and terminate the VMBus connection.
>>> 8) Control is transferred to the previously read-in hibernation image
>>> 9) The hibernation image runs the resume sequence, which
>>> initiates a new VMBus connection and requests offers
>>> 10) VMBus offers come in for whatever VMBus devices were present
>>> when Step 7 initiated the suspend sequence. If a VF device was present
>>> at that time, an offer for that VF device will come in and will match up
>>> with the VF that was present in the VM at the time of hibernation.
>>> 11) ALLOFFERS_DELIVERED message comes in again for the
>>> newly initiated VMBus connection.
>>>
>>
>> 3), 4) works differently IMO. There is no request_for_offers, or
>> ALLOFFERS_DELIVERED for fresh kernel. Otherwise on adding the prints in
>> kernel, we should have seen these function calls *twice* in one
>> hibernation-resume cycle. But that is not the case.
>>
I was looking at the wrong place for fresh kernel logs. The sequence you
mentioned is indeed correct and aligns to my understanding and
experiments results. Kindly ignore my comment above.
>> When the older/original kernel boots up, and requests offers, it gets
>> those VF offers again as part of boot time offers, and then
>> ALLOFFERS_DELIVERED msg comes. I'm still trying to figure out how fresh
>> kernel requests for VF offers or if it gets those offers automatically
>> from the host. I will update my findings so that it can be put up in
>> documentation which you mentioned.
Fresh kernel does not seem to be getting these VF channel offers
automatically, but resuming kernel does, when it calls request_for_offers().
Regards,
Naman
>
> Hmmm. I'm not sure what might be happening. I'll be interested in
> what you find. I do indeed want to call out the details in my
> documentation. And I'll also try to repro myself.
>
> Michael
>
>>
>>> The netvsc driver gets initialized *after* step 4, but we don't know
>>> exactly *when* relative to the storvsc driver. The netvsc driver must
>>> tell Hyper-V that it can handle an SR-IOV VF, and the VF offer is sent
>>> sometime after that. While this netvsc/VF sequence is happening, the
>>> storvsc driver is reading the hibernation image from swap (Step 6).
>>>
>>
>> Maybe this is how fresh kernel gets the offers for VF devices.
>>
>>> I think the sequence you describe works when reading the
>>> hibernation image from swap takes 10's of seconds, or even several
>>> minutes in an Azure VM with a remote disk. That gives plenty
>>> of time for the VF to get initialized and be fully present when Step 7
>>> starts. But there's no *guarantee* that the VF is initialized by then.
>>> It's also not clear to me what action by the guest causes Hyper-V to
>>> treat the VF as "added to the VM" so that in Step 10 the VF offer is
>>> sent before ALLOFFERS_DELIVERED.
>>>
>>> The sequence you describe also happens in an Azure VM, even if
>>> the VF is removed before hibernation. When the VF offer arrives
>>> during Step 10, it doesn't match with any VFs that were in the VM
>>> at the time of hibernation. It's treated as a new device, just like it
>>> would be if the offer arrived after ALLOFFERS_DELIVERED.
>>>
>>> But it seems like there's still the risk of having a fast swap disk
>>> and a small hibernation image that can be read in a shorter amount
>>> of time than it takes to initialize the VF to the point that Hyper-V
>>> treats it as added to the VM. Without knowing what that point is,
>>> it's hard to assess the likelihood of that happening. Or maybe there's
>>> an interlock I'm not aware of that ensures Step 7 can't proceed
>>> while the netvsc/VF sequence is in progress.
>>>
>>> So maybe it's best to proceed with this patch, and deal with the
>>> risk later when/if it becomes reality. I'm OK if you want to do
>>> that. This has been an interesting discussion that I'll try to capture
>>> in some high-level documentation about how Linux guests on
>>> Hyper-V do hibernation!
>>>
>>> Michael
>>
>>
>>
>> I have sent v3 with the changes we discussed.
>>
>> Regards,
>> Naman
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-11-14 6:53 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-29 8:01 [PATCH v2 0/2] Drivers: hv: vmbus: Wait for offers and log missing offers Naman Jain
2024-10-29 8:01 ` [PATCH v2 1/2] Drivers: hv: vmbus: Wait for offers during boot Naman Jain
2024-10-30 20:31 ` Easwar Hariharan
2024-11-05 5:40 ` Naman Jain
2024-10-31 19:13 ` Michael Kelley
2024-11-05 5:40 ` Naman Jain
2024-10-29 8:01 ` [PATCH v2 2/2] Drivers: hv: vmbus: Log on missing offers Naman Jain
2024-10-31 19:14 ` Michael Kelley
2024-11-07 5:44 ` Naman Jain
2024-11-11 5:43 ` Naman Jain
2024-11-12 3:13 ` Michael Kelley
2024-11-13 8:47 ` Naman Jain
2024-11-13 15:26 ` Michael Kelley
2024-11-14 6:53 ` Naman Jain
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).