[PATCH 0/2] Drivers: hv: vmbus: Wait for offers and log missing offers

linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Drivers: hv: vmbus: Wait for offers and log missing offers
@ 2024-10-18 11:58 Naman Jain
  2024-10-18 11:58 ` [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot Naman Jain
  2024-10-18 11:58 ` [PATCH 2/2] Drivers: hv: vmbus: Log on missing offers Naman Jain
  0 siblings, 2 replies; 17+ messages in thread
From: Naman Jain @ 2024-10-18 11:58 UTC (permalink / raw)
  To: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui
  Cc: linux-hyperv, linux-kernel, Naman Jain, John Starks, jacob.pan,
	Easwar Hariharan

After VM requests for channel offers during boot or resume from
hibernation, host offers the available channels and then sends a
separate message when all offers are delivered. Wait for this
message to make this offers request and receipt process synchronous.

This is to support user mode (VTL0) in OpenHCL (A Linux based
paravisor for Confidential VMs) to ensure that all channel offers
are present when init begins in VTL0, and it knows which channels
the host has offered and which it has not.

This is in analogy to a PCI bus not returning from probe until it has
scanned all devices on the bus.

As part of this implementation, some code cleanup is also done for the
logic which becomes redundant due to this change.

Second patch prints the channels which are not offered when resume
happens from hibernation to supply more information to the end user.

John Starks (1):
  Drivers: hv: vmbus: Log on missing offers

Naman Jain (1):
  Drivers: hv: vmbus: Wait for offers during boot

 drivers/hv/channel_mgmt.c | 38 +++++++++++++++++++++++---------------
 drivers/hv/connection.c   |  4 ++--
 drivers/hv/hyperv_vmbus.h | 14 +++-----------
 drivers/hv/vmbus_drv.c    | 30 +++++++++++++++---------------
 4 files changed, 43 insertions(+), 43 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot
  2024-10-18 11:58 [PATCH 0/2] Drivers: hv: vmbus: Wait for offers and log missing offers Naman Jain
@ 2024-10-18 11:58 ` Naman Jain
  2024-10-18 22:52   ` Easwar Hariharan
                     ` (2 more replies)
  2024-10-18 11:58 ` [PATCH 2/2] Drivers: hv: vmbus: Log on missing offers Naman Jain
  1 sibling, 3 replies; 17+ messages in thread
From: Naman Jain @ 2024-10-18 11:58 UTC (permalink / raw)
  To: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui
  Cc: linux-hyperv, linux-kernel, Naman Jain, John Starks, jacob.pan,
	Easwar Hariharan

Channels offers are requested during vmbus initialization and resume
from hibernation. Add support to wait for all channel offers to be
delivered and processed before returning from vmbus_request_offers.
This is to support user mode (VTL0) in OpenHCL (A Linux based
paravisor for Confidential VMs) to ensure that all channel offers
are present when init begins in VTL0, and it knows which channels
the host has offered and which it has not.

This is in analogy to a PCI bus not returning from probe until it has
scanned all devices on the bus.

Without this, user mode can race with vmbus initialization and miss
channel offers. User mode has no way to work around this other than
sleeping for a while, since there is no way to know when vmbus has
finished processing offers.

With this added functionality, remove earlier logic which keeps track
of count of offered channels post resume from hibernation. Once all
offers delivered message is received, no further offers are going to
be received. Consequently, logic to prevent suspend from happening
after previous resume had missing offers, is also removed.

Co-developed-by: John Starks <jostarks@microsoft.com>
Signed-off-by: John Starks <jostarks@microsoft.com>
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
---
 drivers/hv/channel_mgmt.c | 38 +++++++++++++++++++++++---------------
 drivers/hv/connection.c   |  4 ++--
 drivers/hv/hyperv_vmbus.h | 14 +++-----------
 drivers/hv/vmbus_drv.c    | 16 ----------------
 4 files changed, 28 insertions(+), 44 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 3c6011a48dab..ac514805dafe 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -944,16 +944,6 @@ void vmbus_initiate_unload(bool crash)
 		vmbus_wait_for_unload();
 }
 
-static void check_ready_for_resume_event(void)
-{
-	/*
-	 * If all the old primary channels have been fixed up, then it's safe
-	 * to resume.
-	 */
-	if (atomic_dec_and_test(&vmbus_connection.nr_chan_fixup_on_resume))
-		complete(&vmbus_connection.ready_for_resume_event);
-}
-
 static void vmbus_setup_channel_state(struct vmbus_channel *channel,
 				      struct vmbus_channel_offer_channel *offer)
 {
@@ -1109,8 +1099,6 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 
 		/* Add the channel back to the array of channels. */
 		vmbus_channel_map_relid(oldchannel);
-		check_ready_for_resume_event();
-
 		mutex_unlock(&vmbus_connection.channel_mutex);
 		return;
 	}
@@ -1297,12 +1285,11 @@ EXPORT_SYMBOL_GPL(vmbus_hvsock_device_unregister);
 /*
  * vmbus_onoffers_delivered -
  * This is invoked when all offers have been delivered.
- *
- * Nothing to do here.
  */
 static void vmbus_onoffers_delivered(
 			struct vmbus_channel_message_header *hdr)
 {
+	complete(&vmbus_connection.all_offers_delivered_event);
 }
 
 /*
@@ -1578,7 +1565,8 @@ void vmbus_onmessage(struct vmbus_channel_message_header *hdr)
 }
 
 /*
- * vmbus_request_offers - Send a request to get all our pending offers.
+ * vmbus_request_offers - Send a request to get all our pending offers
+ * and wait for all offers to arrive.
  */
 int vmbus_request_offers(void)
 {
@@ -1596,6 +1584,10 @@ int vmbus_request_offers(void)
 
 	msg->msgtype = CHANNELMSG_REQUESTOFFERS;
 
+	/*
+	 * This REQUESTOFFERS message will result in the host sending an all
+	 * offers delivered message.
+	 */
 	ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_message_header),
 			     true);
 
@@ -1607,6 +1599,22 @@ int vmbus_request_offers(void)
 		goto cleanup;
 	}
 
+	/* Wait for the host to send all offers. */
+	while (wait_for_completion_timeout(
+		&vmbus_connection.all_offers_delivered_event, msecs_to_jiffies(10 * 1000)) == 0) {
+		pr_warn("timed out waiting for all offers to be delivered...\n");
+	}
+
+	/*
+	 * Flush handling of offer messages (which may initiate work on
+	 * other work queues).
+	 */
+	flush_workqueue(vmbus_connection.work_queue);
+
+	/* Flush processing the incoming offers. */
+	flush_workqueue(vmbus_connection.handle_primary_chan_wq);
+	flush_workqueue(vmbus_connection.handle_sub_chan_wq);
+
 cleanup:
 	kfree(msginfo);
 
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index f001ae880e1d..8351360bba16 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -34,8 +34,8 @@ struct vmbus_connection vmbus_connection = {
 
 	.ready_for_suspend_event = COMPLETION_INITIALIZER(
 				  vmbus_connection.ready_for_suspend_event),
-	.ready_for_resume_event	= COMPLETION_INITIALIZER(
-				  vmbus_connection.ready_for_resume_event),
+	.all_offers_delivered_event = COMPLETION_INITIALIZER(
+				  vmbus_connection.all_offers_delivered_event),
 };
 EXPORT_SYMBOL_GPL(vmbus_connection);
 
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index d2856023d53c..80cc65dac740 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -287,18 +287,10 @@ struct vmbus_connection {
 	struct completion ready_for_suspend_event;
 
 	/*
-	 * The number of primary channels that should be "fixed up"
-	 * upon resume: these channels are re-offered upon resume, and some
-	 * fields of the channel offers (i.e. child_relid and connection_id)
-	 * can change, so the old offermsg must be fixed up, before the resume
-	 * callbacks of the VSC drivers start to further touch the channels.
+	 * Completed once the host has offered all channels. Note that
+	 * some channels may still be being process on a work queue.
 	 */
-	atomic_t nr_chan_fixup_on_resume;
-	/*
-	 * vmbus_bus_resume() waits for "nr_chan_fixup_on_resume" to
-	 * drop to zero.
-	 */
-	struct completion ready_for_resume_event;
+	struct completion all_offers_delivered_event;
 };
 
 
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 9b15f7daf505..bd3fc41dc06b 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2427,11 +2427,6 @@ static int vmbus_bus_suspend(struct device *dev)
 	if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
 		wait_for_completion(&vmbus_connection.ready_for_suspend_event);
 
-	if (atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0) {
-		pr_err("Can not suspend due to a previous failed resuming\n");
-		return -EBUSY;
-	}
-
 	mutex_lock(&vmbus_connection.channel_mutex);
 
 	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
@@ -2456,17 +2451,12 @@ static int vmbus_bus_suspend(struct device *dev)
 			pr_err("Sub-channel not deleted!\n");
 			WARN_ON_ONCE(1);
 		}
-
-		atomic_inc(&vmbus_connection.nr_chan_fixup_on_resume);
 	}
 
 	mutex_unlock(&vmbus_connection.channel_mutex);
 
 	vmbus_initiate_unload(false);
 
-	/* Reset the event for the next resume. */
-	reinit_completion(&vmbus_connection.ready_for_resume_event);
-
 	return 0;
 }
 
@@ -2502,14 +2492,8 @@ static int vmbus_bus_resume(struct device *dev)
 	if (ret != 0)
 		return ret;
 
-	WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) == 0);
-
 	vmbus_request_offers();
 
-	if (wait_for_completion_timeout(
-		&vmbus_connection.ready_for_resume_event, 10 * HZ) == 0)
-		pr_err("Some vmbus device is missing after suspending?\n");
-
 	/* Reset the event for the next suspend. */
 	reinit_completion(&vmbus_connection.ready_for_suspend_event);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] Drivers: hv: vmbus: Log on missing offers
  2024-10-18 11:58 [PATCH 0/2] Drivers: hv: vmbus: Wait for offers and log missing offers Naman Jain
  2024-10-18 11:58 ` [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot Naman Jain
@ 2024-10-18 11:58 ` Naman Jain
  2024-10-21  2:43   ` Easwar Hariharan
  2024-10-21  4:26   ` Saurabh Singh Sengar
  1 sibling, 2 replies; 17+ messages in thread
From: Naman Jain @ 2024-10-18 11:58 UTC (permalink / raw)
  To: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui
  Cc: linux-hyperv, linux-kernel, Naman Jain, John Starks, jacob.pan,
	Easwar Hariharan

From: John Starks <jostarks@microsoft.com>

When resuming from hibernation, log any channels that were present
before hibernation but now are gone.

Signed-off-by: John Starks <jostarks@microsoft.com>
Co-developed-by: Naman Jain <namjain@linux.microsoft.com>
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
---
 drivers/hv/vmbus_drv.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index bd3fc41dc06b..1f56d138210e 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2462,6 +2462,7 @@ static int vmbus_bus_suspend(struct device *dev)
 
 static int vmbus_bus_resume(struct device *dev)
 {
+	struct vmbus_channel *channel;
 	struct vmbus_channel_msginfo *msginfo;
 	size_t msgsize;
 	int ret;
@@ -2494,6 +2495,21 @@ static int vmbus_bus_resume(struct device *dev)
 
 	vmbus_request_offers();
 
+	mutex_lock(&vmbus_connection.channel_mutex);
+	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
+		if (channel->offermsg.child_relid != INVALID_RELID)
+			continue;
+
+		/* hvsock channels are not expected to be present. */
+		if (is_hvsock_channel(channel))
+			continue;
+
+		pr_err("channel %pUl/%pUl not present after resume.\n",
+			&channel->offermsg.offer.if_type,
+			&channel->offermsg.offer.if_instance);
+	}
+	mutex_unlock(&vmbus_connection.channel_mutex);
+
 	/* Reset the event for the next suspend. */
 	reinit_completion(&vmbus_connection.ready_for_suspend_event);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot
  2024-10-18 11:58 ` [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot Naman Jain
@ 2024-10-18 22:52   ` Easwar Hariharan
  2024-10-21  4:33   ` Michael Kelley
  2024-10-21  4:57   ` Saurabh Singh Sengar
  2 siblings, 0 replies; 17+ messages in thread
From: Easwar Hariharan @ 2024-10-18 22:52 UTC (permalink / raw)
  To: Naman Jain, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui
  Cc: eahariha, linux-hyperv, linux-kernel, John Starks, jacob.pan

On 10/18/2024 4:58 AM, Naman Jain wrote:
> Channels offers are requested during vmbus initialization and resume
> from hibernation. Add support to wait for all channel offers to be
> delivered and processed before returning from vmbus_request_offers.
> This is to support user mode (VTL0) in OpenHCL (A Linux based
> paravisor for Confidential VMs) to ensure that all channel offers
> are present when init begins in VTL0, and it knows which channels
> the host has offered and which it has not.
> 
> This is in analogy to a PCI bus not returning from probe until it has
> scanned all devices on the bus.
> 
> Without this, user mode can race with vmbus initialization and miss
> channel offers. User mode has no way to work around this other than
> sleeping for a while, since there is no way to know when vmbus has
> finished processing offers.
> 
> With this added functionality, remove earlier logic which keeps track
> of count of offered channels post resume from hibernation. Once all
> offers delivered message is received, no further offers are going to
> be received. Consequently, logic to prevent suspend from happening
> after previous resume had missing offers, is also removed.
> 
> Co-developed-by: John Starks <jostarks@microsoft.com>
> Signed-off-by: John Starks <jostarks@microsoft.com>
> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
> ---
>  drivers/hv/channel_mgmt.c | 38 +++++++++++++++++++++++---------------
>  drivers/hv/connection.c   |  4 ++--
>  drivers/hv/hyperv_vmbus.h | 14 +++-----------
>  drivers/hv/vmbus_drv.c    | 16 ----------------
>  4 files changed, 28 insertions(+), 44 deletions(-)

Looks good to me.

Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] Drivers: hv: vmbus: Log on missing offers
  2024-10-18 11:58 ` [PATCH 2/2] Drivers: hv: vmbus: Log on missing offers Naman Jain
@ 2024-10-21  2:43   ` Easwar Hariharan
  2024-10-21  4:26   ` Saurabh Singh Sengar
  1 sibling, 0 replies; 17+ messages in thread
From: Easwar Hariharan @ 2024-10-21  2:43 UTC (permalink / raw)
  To: Naman Jain, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui
  Cc: eahariha, linux-hyperv, linux-kernel, John Starks, jacob.pan

On 10/18/2024 4:58 AM, Naman Jain wrote:
> From: John Starks <jostarks@microsoft.com>
> 
> When resuming from hibernation, log any channels that were present
> before hibernation but now are gone.
> 
> Signed-off-by: John Starks <jostarks@microsoft.com>
> Co-developed-by: Naman Jain <namjain@linux.microsoft.com>
> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
> ---
>  drivers/hv/vmbus_drv.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>

Looks good to me.

Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] Drivers: hv: vmbus: Log on missing offers
  2024-10-18 11:58 ` [PATCH 2/2] Drivers: hv: vmbus: Log on missing offers Naman Jain
  2024-10-21  2:43   ` Easwar Hariharan
@ 2024-10-21  4:26   ` Saurabh Singh Sengar
  2024-10-21  4:31     ` Naman Jain
  1 sibling, 1 reply; 17+ messages in thread
From: Saurabh Singh Sengar @ 2024-10-21  4:26 UTC (permalink / raw)
  To: Naman Jain
  Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	linux-hyperv, linux-kernel, John Starks, jacob.pan,
	Easwar Hariharan

On Fri, Oct 18, 2024 at 04:58:11AM -0700, Naman Jain wrote:
> From: John Starks <jostarks@microsoft.com>
> 
> When resuming from hibernation, log any channels that were present
> before hibernation but now are gone.
> 
> Signed-off-by: John Starks <jostarks@microsoft.com>
> Co-developed-by: Naman Jain <namjain@linux.microsoft.com>
> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
> ---
>  drivers/hv/vmbus_drv.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index bd3fc41dc06b..1f56d138210e 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2462,6 +2462,7 @@ static int vmbus_bus_suspend(struct device *dev)
>  
>  static int vmbus_bus_resume(struct device *dev)
>  {
> +	struct vmbus_channel *channel;
>  	struct vmbus_channel_msginfo *msginfo;
>  	size_t msgsize;
>  	int ret;
> @@ -2494,6 +2495,21 @@ static int vmbus_bus_resume(struct device *dev)
>  
>  	vmbus_request_offers();
>  
> +	mutex_lock(&vmbus_connection.channel_mutex);
> +	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
> +		if (channel->offermsg.child_relid != INVALID_RELID)
> +			continue;
> +
> +		/* hvsock channels are not expected to be present. */
> +		if (is_hvsock_channel(channel))
> +			continue;
> +
> +		pr_err("channel %pUl/%pUl not present after resume.\n",
> +			&channel->offermsg.offer.if_type,
> +			&channel->offermsg.offer.if_instance);

Do we want to put a TODO here, so as to do cleanup of these channels
like force rescind etc later ?

- Saurabh


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] Drivers: hv: vmbus: Log on missing offers
  2024-10-21  4:26   ` Saurabh Singh Sengar
@ 2024-10-21  4:31     ` Naman Jain
  0 siblings, 0 replies; 17+ messages in thread
From: Naman Jain @ 2024-10-21  4:31 UTC (permalink / raw)
  To: Saurabh Singh Sengar
  Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	linux-hyperv, linux-kernel, John Starks, jacob.pan,
	Easwar Hariharan



On 10/21/2024 9:56 AM, Saurabh Singh Sengar wrote:
> On Fri, Oct 18, 2024 at 04:58:11AM -0700, Naman Jain wrote:
>> From: John Starks <jostarks@microsoft.com>
>>
>> When resuming from hibernation, log any channels that were present
>> before hibernation but now are gone.
>>
>> Signed-off-by: John Starks <jostarks@microsoft.com>
>> Co-developed-by: Naman Jain <namjain@linux.microsoft.com>
>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>> ---
>>   drivers/hv/vmbus_drv.c | 16 ++++++++++++++++
>>   1 file changed, 16 insertions(+)
>>
>> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
>> index bd3fc41dc06b..1f56d138210e 100644
>> --- a/drivers/hv/vmbus_drv.c
>> +++ b/drivers/hv/vmbus_drv.c
>> @@ -2462,6 +2462,7 @@ static int vmbus_bus_suspend(struct device *dev)
>>   
>>   static int vmbus_bus_resume(struct device *dev)
>>   {
>> +	struct vmbus_channel *channel;
>>   	struct vmbus_channel_msginfo *msginfo;
>>   	size_t msgsize;
>>   	int ret;
>> @@ -2494,6 +2495,21 @@ static int vmbus_bus_resume(struct device *dev)
>>   
>>   	vmbus_request_offers();
>>   
>> +	mutex_lock(&vmbus_connection.channel_mutex);
>> +	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
>> +		if (channel->offermsg.child_relid != INVALID_RELID)
>> +			continue;
>> +
>> +		/* hvsock channels are not expected to be present. */
>> +		if (is_hvsock_channel(channel))
>> +			continue;
>> +
>> +		pr_err("channel %pUl/%pUl not present after resume.\n",
>> +			&channel->offermsg.offer.if_type,
>> +			&channel->offermsg.offer.if_instance);
> 
> Do we want to put a TODO here, so as to do cleanup of these channels
> like force rescind etc later ?
> 
> - Saurabh


Sure, we can put that.

Regards,
Naman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot
  2024-10-18 11:58 ` [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot Naman Jain
  2024-10-18 22:52   ` Easwar Hariharan
@ 2024-10-21  4:33   ` Michael Kelley
  2024-10-22  9:50     ` Naman Jain
  2024-10-21  4:57   ` Saurabh Singh Sengar
  2 siblings, 1 reply; 17+ messages in thread
From: Michael Kelley @ 2024-10-21  4:33 UTC (permalink / raw)
  To: Naman Jain, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan

From: Naman Jain <namjain@linux.microsoft.com> Sent: Friday, October 18, 2024 4:58 AM
> 
> Channels offers are requested during vmbus initialization and resume
> from hibernation. Add support to wait for all channel offers to be
> delivered and processed before returning from vmbus_request_offers.
> This is to support user mode (VTL0) in OpenHCL (A Linux based
> paravisor for Confidential VMs) to ensure that all channel offers
> are present when init begins in VTL0, and it knows which channels
> the host has offered and which it has not.
> 
> This is in analogy to a PCI bus not returning from probe until it has
> scanned all devices on the bus.
> 
> Without this, user mode can race with vmbus initialization and miss
> channel offers. User mode has no way to work around this other than
> sleeping for a while, since there is no way to know when vmbus has
> finished processing offers.
> 
> With this added functionality, remove earlier logic which keeps track
> of count of offered channels post resume from hibernation. Once all
> offers delivered message is received, no further offers are going to
> be received. Consequently, logic to prevent suspend from happening
> after previous resume had missing offers, is also removed.
> 
> Co-developed-by: John Starks <jostarks@microsoft.com>
> Signed-off-by: John Starks <jostarks@microsoft.com>
> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
> ---
>  drivers/hv/channel_mgmt.c | 38 +++++++++++++++++++++++---------------
>  drivers/hv/connection.c   |  4 ++--
>  drivers/hv/hyperv_vmbus.h | 14 +++-----------
>  drivers/hv/vmbus_drv.c    | 16 ----------------
>  4 files changed, 28 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 3c6011a48dab..ac514805dafe 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -944,16 +944,6 @@ void vmbus_initiate_unload(bool crash)
>  		vmbus_wait_for_unload();
>  }
> 
> -static void check_ready_for_resume_event(void)
> -{
> -	/*
> -	 * If all the old primary channels have been fixed up, then it's safe
> -	 * to resume.
> -	 */
> -	if (atomic_dec_and_test(&vmbus_connection.nr_chan_fixup_on_resume))
> -		complete(&vmbus_connection.ready_for_resume_event);
> -}
> -
>  static void vmbus_setup_channel_state(struct vmbus_channel *channel,
>  				      struct vmbus_channel_offer_channel *offer)
>  {
> @@ -1109,8 +1099,6 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
> 
>  		/* Add the channel back to the array of channels. */
>  		vmbus_channel_map_relid(oldchannel);
> -		check_ready_for_resume_event();
> -
>  		mutex_unlock(&vmbus_connection.channel_mutex);
>  		return;
>  	}
> @@ -1297,12 +1285,11 @@
> EXPORT_SYMBOL_GPL(vmbus_hvsock_device_unregister);
>  /*
>   * vmbus_onoffers_delivered -
>   * This is invoked when all offers have been delivered.
> - *
> - * Nothing to do here.
>   */

I'm unclear on the meaning of the ALLOFFERS_DELIVERED message. In the
early days of Hyper-V, the set of virtual devices in a VM was fixed before a
VM started, so presumably ALLOFFERS_DELIVERED meant that offers for
that fixed set of devices had been delivered. That meaning makes sense
conceptually.

But more recent versions of Hyper-V support adding and removing devices
at any time during the life of the VM. If a device is added, a new offer is
generated. Existing devices (such as netvsc) can reconfigure their channels,
in which case new subchannel offers are generated. So the core concept of
"all offers delivered" isn't valid anymore.

To date Linux hasn't done anything with this message, so we didn't need
precision about what it means. There's no VMBus specification or
documentation, so we need some text here in the comments that
explains precisely what ALLOFFERS_DELIVERED means, and how that
meaning aligns with the fact that offers can be delivered anytime during
the life of the VM.

I'll note that in the past there's also been some behavior where during guest
boot, Hyper-V offers a virtual PCI device to the guest, immediately
rescinds the vPCI device (before Linux even finished processing the offer),
then offers it again. I wonder about how ALLOFFERS_DELIVERED interacts
with that situation.

I ran some tests in an Azure L16s_v3 VM, which has three vPCI devices:
2 NVMe devices and 1 Mellanox CX-5. The offers for the 2 NVMe devices
came before the ALLOFFERS_DELIVERED message, but the offer for
the Mellanox CX-5 came *after* the ALLOFFERS_DELIVERED message.
If that's the way the Mellanox CX-5 offers work, this patch breaks things
in the hibernation resume path because ALLOFFERS_DELIVERED isn't
sufficient to indicate that all primary channels have been re-offered
and the resume can proceed. All sub-channel offers came after the
ALLOFFERS_DELIVERED message, but that is what I expected and
shouldn't be a problem.

It's hard for me to review this patch set without some precision around
what ALLOFFERS_DELIVERED means.

>  static void vmbus_onoffers_delivered(
>  			struct vmbus_channel_message_header *hdr)
>  {
> +	complete(&vmbus_connection.all_offers_delivered_event);
>  }
> 
>  /*
> @@ -1578,7 +1565,8 @@ void vmbus_onmessage(struct vmbus_channel_message_header *hdr)
>  }
> 
>  /*
> - * vmbus_request_offers - Send a request to get all our pending offers.
> + * vmbus_request_offers - Send a request to get all our pending offers
> + * and wait for all offers to arrive.
>   */
>  int vmbus_request_offers(void)
>  {
> @@ -1596,6 +1584,10 @@ int vmbus_request_offers(void)
> 
>  	msg->msgtype = CHANNELMSG_REQUESTOFFERS;
> 
> +	/*
> +	 * This REQUESTOFFERS message will result in the host sending an all
> +	 * offers delivered message.
> +	 */
>  	ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_message_header),
>  			     true);
> 
> @@ -1607,6 +1599,22 @@ int vmbus_request_offers(void)
>  		goto cleanup;
>  	}
> 
> +	/* Wait for the host to send all offers. */
> +	while (wait_for_completion_timeout(
> +		&vmbus_connection.all_offers_delivered_event, msecs_to_jiffies(10 * 1000)) == 0) {

Maybe do !wait_for_completion_timeout( ...) instead of explicitly testing
for 0? I see that some existing code uses the explicit test for 0, but that's
not the usual kernel code idiom.

> +		pr_warn("timed out waiting for all offers to be delivered...\n");
> +	}

The while loop could continue forever. We don't want to trust the Hyper-V
host to be well-behaved, so there should be an uber timeout where it gives
up after 100 seconds (or pick some value). If the uber timeout is reached,
I'm not sure if the code should panic or just go on -- it's debatable. But doing
something explicit is probably better than repeatedly outputting the
warning message forever.

> +
> +	/*
> +	 * Flush handling of offer messages (which may initiate work on
> +	 * other work queues).
> +	 */
> +	flush_workqueue(vmbus_connection.work_queue);
> +
> +	/* Flush processing the incoming offers. */
> +	flush_workqueue(vmbus_connection.handle_primary_chan_wq);
> +	flush_workqueue(vmbus_connection.handle_sub_chan_wq);

Why does the sub-channel workqueue need to be flushed? Sub-channels
get created at the request of the Linux driver, in cooperation with the VSP
on the Hyper-V side. If the top-level goal is for user-space drivers to be
able to know that at least a core set of offers have been processed, it
seems like waiting for sub-channel offers isn't necessary. And new
subchannel offers could arrive immediately after this flush, so the flush
doesn't provide any guarantee about whether there are offers in that
workqueue. If there is a valid reason to wait, some explanatory
comments would be helpful.

Michael

> +
>  cleanup:
>  	kfree(msginfo);
> 
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index f001ae880e1d..8351360bba16 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -34,8 +34,8 @@ struct vmbus_connection vmbus_connection = {
> 
>  	.ready_for_suspend_event = COMPLETION_INITIALIZER(
>  				  vmbus_connection.ready_for_suspend_event),
> -	.ready_for_resume_event	= COMPLETION_INITIALIZER(
> -				  vmbus_connection.ready_for_resume_event),
> +	.all_offers_delivered_event = COMPLETION_INITIALIZER(
> +				  vmbus_connection.all_offers_delivered_event),
>  };
>  EXPORT_SYMBOL_GPL(vmbus_connection);
> 
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index d2856023d53c..80cc65dac740 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -287,18 +287,10 @@ struct vmbus_connection {
>  	struct completion ready_for_suspend_event;
> 
>  	/*
> -	 * The number of primary channels that should be "fixed up"
> -	 * upon resume: these channels are re-offered upon resume, and some
> -	 * fields of the channel offers (i.e. child_relid and connection_id)
> -	 * can change, so the old offermsg must be fixed up, before the resume
> -	 * callbacks of the VSC drivers start to further touch the channels.
> +	 * Completed once the host has offered all channels. Note that
> +	 * some channels may still be being process on a work queue.
>  	 */
> -	atomic_t nr_chan_fixup_on_resume;
> -	/*
> -	 * vmbus_bus_resume() waits for "nr_chan_fixup_on_resume" to
> -	 * drop to zero.
> -	 */
> -	struct completion ready_for_resume_event;
> +	struct completion all_offers_delivered_event;
>  };
> 
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 9b15f7daf505..bd3fc41dc06b 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2427,11 +2427,6 @@ static int vmbus_bus_suspend(struct device *dev)
>  	if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
>  		wait_for_completion(&vmbus_connection.ready_for_suspend_event);
> 
> -	if (atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0) {
> -		pr_err("Can not suspend due to a previous failed resuming\n");
> -		return -EBUSY;
> -	}
> -
>  	mutex_lock(&vmbus_connection.channel_mutex);
> 
>  	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
> @@ -2456,17 +2451,12 @@ static int vmbus_bus_suspend(struct device *dev)
>  			pr_err("Sub-channel not deleted!\n");
>  			WARN_ON_ONCE(1);
>  		}
> -
> -		atomic_inc(&vmbus_connection.nr_chan_fixup_on_resume);
>  	}
> 
>  	mutex_unlock(&vmbus_connection.channel_mutex);
> 
>  	vmbus_initiate_unload(false);
> 
> -	/* Reset the event for the next resume. */
> -	reinit_completion(&vmbus_connection.ready_for_resume_event);
> -
>  	return 0;
>  }
> 
> @@ -2502,14 +2492,8 @@ static int vmbus_bus_resume(struct device *dev)
>  	if (ret != 0)
>  		return ret;
> 
> -	WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) == 0);
> -
>  	vmbus_request_offers();
> 
> -	if (wait_for_completion_timeout(
> -		&vmbus_connection.ready_for_resume_event, 10 * HZ) == 0)
> -		pr_err("Some vmbus device is missing after suspending?\n");
> -
>  	/* Reset the event for the next suspend. */
>  	reinit_completion(&vmbus_connection.ready_for_suspend_event);
> 
> --
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot
  2024-10-18 11:58 ` [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot Naman Jain
  2024-10-18 22:52   ` Easwar Hariharan
  2024-10-21  4:33   ` Michael Kelley
@ 2024-10-21  4:57   ` Saurabh Singh Sengar
  2024-10-22  9:55     ` Naman Jain
  2 siblings, 1 reply; 17+ messages in thread
From: Saurabh Singh Sengar @ 2024-10-21  4:57 UTC (permalink / raw)
  To: Naman Jain
  Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	linux-hyperv, linux-kernel, John Starks, jacob.pan,
	Easwar Hariharan

On Fri, Oct 18, 2024 at 04:58:10AM -0700, Naman Jain wrote:
> Channels offers are requested during vmbus initialization and resume

Nit: s/vmbus/VMBus

> from hibernation. Add support to wait for all channel offers to be
> delivered and processed before returning from vmbus_request_offers.
> This is to support user mode (VTL0) in OpenHCL (A Linux based
> paravisor for Confidential VMs) to ensure that all channel offers
> are present when init begins in VTL0, and it knows which channels
> the host has offered and which it has not.

Usermode isn't necessarily of VTL0, and this issue was actually identified
at a higher VTL in OpenHCL. However, this change isn't specific to OpenHCL,
but is intended for general use. I would prefer if the commit message were
either more generic or precisely aligned with the specific issue it's
addressing.

> 
> This is in analogy to a PCI bus not returning from probe until it has
> scanned all devices on the bus.
> 
> Without this, user mode can race with vmbus initialization and miss
> channel offers. User mode has no way to work around this other than
> sleeping for a while, since there is no way to know when vmbus has
> finished processing offers.
> 
> With this added functionality, remove earlier logic which keeps track
> of count of offered channels post resume from hibernation. Once all
> offers delivered message is received, no further offers are going to
> be received. Consequently, logic to prevent suspend from happening
> after previous resume had missing offers, is also removed.
> 
> Co-developed-by: John Starks <jostarks@microsoft.com>
> Signed-off-by: John Starks <jostarks@microsoft.com>
> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
> ---
>  drivers/hv/channel_mgmt.c | 38 +++++++++++++++++++++++---------------
>  drivers/hv/connection.c   |  4 ++--
>  drivers/hv/hyperv_vmbus.h | 14 +++-----------
>  drivers/hv/vmbus_drv.c    | 16 ----------------
>  4 files changed, 28 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 3c6011a48dab..ac514805dafe 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -944,16 +944,6 @@ void vmbus_initiate_unload(bool crash)
>  		vmbus_wait_for_unload();
>  }
>  
> -static void check_ready_for_resume_event(void)
> -{
> -	/*
> -	 * If all the old primary channels have been fixed up, then it's safe
> -	 * to resume.
> -	 */
> -	if (atomic_dec_and_test(&vmbus_connection.nr_chan_fixup_on_resume))
> -		complete(&vmbus_connection.ready_for_resume_event);
> -}
> -
>  static void vmbus_setup_channel_state(struct vmbus_channel *channel,
>  				      struct vmbus_channel_offer_channel *offer)
>  {
> @@ -1109,8 +1099,6 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
>  
>  		/* Add the channel back to the array of channels. */
>  		vmbus_channel_map_relid(oldchannel);
> -		check_ready_for_resume_event();
> -
>  		mutex_unlock(&vmbus_connection.channel_mutex);
>  		return;
>  	}
> @@ -1297,12 +1285,11 @@ EXPORT_SYMBOL_GPL(vmbus_hvsock_device_unregister);
>  /*
>   * vmbus_onoffers_delivered -
>   * This is invoked when all offers have been delivered.
> - *
> - * Nothing to do here.
>   */
>  static void vmbus_onoffers_delivered(
>  			struct vmbus_channel_message_header *hdr)
>  {
> +	complete(&vmbus_connection.all_offers_delivered_event);
>  }
>  
>  /*
> @@ -1578,7 +1565,8 @@ void vmbus_onmessage(struct vmbus_channel_message_header *hdr)
>  }
>  
>  /*
> - * vmbus_request_offers - Send a request to get all our pending offers.
> + * vmbus_request_offers - Send a request to get all our pending offers
> + * and wait for all offers to arrive.
>   */
>  int vmbus_request_offers(void)
>  {
> @@ -1596,6 +1584,10 @@ int vmbus_request_offers(void)
>  
>  	msg->msgtype = CHANNELMSG_REQUESTOFFERS;
>  
> +	/*
> +	 * This REQUESTOFFERS message will result in the host sending an all
> +	 * offers delivered message.
> +	 */
>  	ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_message_header),
>  			     true);
>  
> @@ -1607,6 +1599,22 @@ int vmbus_request_offers(void)
>  		goto cleanup;
>  	}
>  
> +	/* Wait for the host to send all offers. */
> +	while (wait_for_completion_timeout(
> +		&vmbus_connection.all_offers_delivered_event, msecs_to_jiffies(10 * 1000)) == 0) {

Nit: Can simply put 10000 instead of 10*1000

> +		pr_warn("timed out waiting for all offers to be delivered...\n");

I know we are moving from async to sync, so earlier we never checked this.
But what if some channel timed out do we want to handle this case ? Or put
a comment why this is OK.

We could set error from here as well, but I see vmbus_request_offers return value
is never checked.

> +	}
> +
> +	/*
> +	 * Flush handling of offer messages (which may initiate work on
> +	 * other work queues).
> +	 */
> +	flush_workqueue(vmbus_connection.work_queue);
> +
> +	/* Flush processing the incoming offers. */
> +	flush_workqueue(vmbus_connection.handle_primary_chan_wq);
> +	flush_workqueue(vmbus_connection.handle_sub_chan_wq);
> +
>  cleanup:
>  	kfree(msginfo);
>  
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index f001ae880e1d..8351360bba16 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -34,8 +34,8 @@ struct vmbus_connection vmbus_connection = {
>  
>  	.ready_for_suspend_event = COMPLETION_INITIALIZER(
>  				  vmbus_connection.ready_for_suspend_event),
> -	.ready_for_resume_event	= COMPLETION_INITIALIZER(
> -				  vmbus_connection.ready_for_resume_event),
> +	.all_offers_delivered_event = COMPLETION_INITIALIZER(
> +				  vmbus_connection.all_offers_delivered_event),
>  };
>  EXPORT_SYMBOL_GPL(vmbus_connection);
>  
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index d2856023d53c..80cc65dac740 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -287,18 +287,10 @@ struct vmbus_connection {
>  	struct completion ready_for_suspend_event;
>  
>  	/*
> -	 * The number of primary channels that should be "fixed up"
> -	 * upon resume: these channels are re-offered upon resume, and some
> -	 * fields of the channel offers (i.e. child_relid and connection_id)
> -	 * can change, so the old offermsg must be fixed up, before the resume
> -	 * callbacks of the VSC drivers start to further touch the channels.
> +	 * Completed once the host has offered all channels. Note that
> +	 * some channels may still be being process on a work queue.
>  	 */
> -	atomic_t nr_chan_fixup_on_resume;
> -	/*
> -	 * vmbus_bus_resume() waits for "nr_chan_fixup_on_resume" to
> -	 * drop to zero.
> -	 */
> -	struct completion ready_for_resume_event;
> +	struct completion all_offers_delivered_event;
>  };
>  
>  
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 9b15f7daf505..bd3fc41dc06b 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2427,11 +2427,6 @@ static int vmbus_bus_suspend(struct device *dev)
>  	if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
>  		wait_for_completion(&vmbus_connection.ready_for_suspend_event);
>  
> -	if (atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0) {
> -		pr_err("Can not suspend due to a previous failed resuming\n");
> -		return -EBUSY;
> -	}
> -
>  	mutex_lock(&vmbus_connection.channel_mutex);
>  
>  	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
> @@ -2456,17 +2451,12 @@ static int vmbus_bus_suspend(struct device *dev)
>  			pr_err("Sub-channel not deleted!\n");
>  			WARN_ON_ONCE(1);
>  		}
> -
> -		atomic_inc(&vmbus_connection.nr_chan_fixup_on_resume);
>  	}
>  
>  	mutex_unlock(&vmbus_connection.channel_mutex);
>  
>  	vmbus_initiate_unload(false);
>  
> -	/* Reset the event for the next resume. */
> -	reinit_completion(&vmbus_connection.ready_for_resume_event);
> -
>  	return 0;
>  }
>  
> @@ -2502,14 +2492,8 @@ static int vmbus_bus_resume(struct device *dev)
>  	if (ret != 0)
>  		return ret;
>  
> -	WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) == 0);
> -
>  	vmbus_request_offers();
>  
> -	if (wait_for_completion_timeout(
> -		&vmbus_connection.ready_for_resume_event, 10 * HZ) == 0)
> -		pr_err("Some vmbus device is missing after suspending?\n");
> -
>  	/* Reset the event for the next suspend. */
>  	reinit_completion(&vmbus_connection.ready_for_suspend_event);
>  
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot
  2024-10-21  4:33   ` Michael Kelley
@ 2024-10-22  9:50     ` Naman Jain
  2024-10-22 18:03       ` Michael Kelley
  0 siblings, 1 reply; 17+ messages in thread
From: Naman Jain @ 2024-10-22  9:50 UTC (permalink / raw)
  To: Michael Kelley, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan



On 10/21/2024 10:03 AM, Michael Kelley wrote:
> From: Naman Jain <namjain@linux.microsoft.com> Sent: Friday, October 18, 2024 4:58 AM
>>
>> Channels offers are requested during vmbus initialization and resume
>> from hibernation. Add support to wait for all channel offers to be
>> delivered and processed before returning from vmbus_request_offers.
>> This is to support user mode (VTL0) in OpenHCL (A Linux based
>> paravisor for Confidential VMs) to ensure that all channel offers
>> are present when init begins in VTL0, and it knows which channels
>> the host has offered and which it has not.
>>
>> This is in analogy to a PCI bus not returning from probe until it has
>> scanned all devices on the bus.
>>
>> Without this, user mode can race with vmbus initialization and miss
>> channel offers. User mode has no way to work around this other than
>> sleeping for a while, since there is no way to know when vmbus has
>> finished processing offers.
>>
>> With this added functionality, remove earlier logic which keeps track
>> of count of offered channels post resume from hibernation. Once all
>> offers delivered message is received, no further offers are going to
>> be received. Consequently, logic to prevent suspend from happening
>> after previous resume had missing offers, is also removed.
>>
>> Co-developed-by: John Starks <jostarks@microsoft.com>
>> Signed-off-by: John Starks <jostarks@microsoft.com>
>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>> ---
>>   drivers/hv/channel_mgmt.c | 38 +++++++++++++++++++++++---------------
>>   drivers/hv/connection.c   |  4 ++--
>>   drivers/hv/hyperv_vmbus.h | 14 +++-----------
>>   drivers/hv/vmbus_drv.c    | 16 ----------------
>>   4 files changed, 28 insertions(+), 44 deletions(-)
>>
>> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> index 3c6011a48dab..ac514805dafe 100644
>> --- a/drivers/hv/channel_mgmt.c
>> +++ b/drivers/hv/channel_mgmt.c
>> @@ -944,16 +944,6 @@ void vmbus_initiate_unload(bool crash)
>>   		vmbus_wait_for_unload();
>>   }
>>
>> -static void check_ready_for_resume_event(void)
>> -{
>> -	/*
>> -	 * If all the old primary channels have been fixed up, then it's safe
>> -	 * to resume.
>> -	 */
>> -	if (atomic_dec_and_test(&vmbus_connection.nr_chan_fixup_on_resume))
>> -		complete(&vmbus_connection.ready_for_resume_event);
>> -}
>> -
>>   static void vmbus_setup_channel_state(struct vmbus_channel *channel,
>>   				      struct vmbus_channel_offer_channel *offer)
>>   {
>> @@ -1109,8 +1099,6 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
>>
>>   		/* Add the channel back to the array of channels. */
>>   		vmbus_channel_map_relid(oldchannel);
>> -		check_ready_for_resume_event();
>> -
>>   		mutex_unlock(&vmbus_connection.channel_mutex);
>>   		return;
>>   	}
>> @@ -1297,12 +1285,11 @@
>> EXPORT_SYMBOL_GPL(vmbus_hvsock_device_unregister);
>>   /*
>>    * vmbus_onoffers_delivered -
>>    * This is invoked when all offers have been delivered.
>> - *
>> - * Nothing to do here.
>>    */
> 
> I'm unclear on the meaning of the ALLOFFERS_DELIVERED message. In the
> early days of Hyper-V, the set of virtual devices in a VM was fixed before a
> VM started, so presumably ALLOFFERS_DELIVERED meant that offers for
> that fixed set of devices had been delivered. That meaning makes sense
> conceptually.
> 
> But more recent versions of Hyper-V support adding and removing devices
> at any time during the life of the VM. If a device is added, a new offer is
> generated. Existing devices (such as netvsc) can reconfigure their channels,
> in which case new subchannel offers are generated. So the core concept of
> "all offers delivered" isn't valid anymore.
> 
> To date Linux hasn't done anything with this message, so we didn't need
> precision about what it means. There's no VMBus specification or
> documentation, so we need some text here in the comments that
> explains precisely what ALLOFFERS_DELIVERED means, and how that
> meaning aligns with the fact that offers can be delivered anytime during
> the life of the VM.
> 
> I'll note that in the past there's also been some behavior where during guest
> boot, Hyper-V offers a virtual PCI device to the guest, immediately
> rescinds the vPCI device (before Linux even finished processing the offer),
> then offers it again. I wonder about how ALLOFFERS_DELIVERED interacts
> with that situation.
> 
> I ran some tests in an Azure L16s_v3 VM, which has three vPCI devices:
> 2 NVMe devices and 1 Mellanox CX-5. The offers for the 2 NVMe devices
> came before the ALLOFFERS_DELIVERED message, but the offer for
> the Mellanox CX-5 came *after* the ALLOFFERS_DELIVERED message.
> If that's the way the Mellanox CX-5 offers work, this patch breaks things
> in the hibernation resume path because ALLOFFERS_DELIVERED isn't
> sufficient to indicate that all primary channels have been re-offered
> and the resume can proceed. All sub-channel offers came after the
> ALLOFFERS_DELIVERED message, but that is what I expected and
> shouldn't be a problem.
> 
> It's hard for me to review this patch set without some precision around
> what ALLOFFERS_DELIVERED means.

Thanks for your valuable comments.
I checked this internally with John and here is the information you are
looking for.

CHANNELMSG_ALLOFFERS_DELIVERED message arrives after all the boot-time
offers are delivered. Other channels can be hot added later, even
immediately after the all-offers-delivered message.

A boot-time offer will be any of the virtual hardware the VM is
configured with at boot. The reason the Mellanox NIC is not included in
this list is because the Mellanoc NIC is an optional accelerator to the
synthetic vmbus NIC. It is hot added only after the vmbus NIC channel is
opened (once we know the guest can support it, via the sriov bit in the
netvsc protocol). So, it is not there at boot time, as we define it.

The reason the logs show the ordering below is that the vmbus driver
queues offer work to a work queue. The logs are out of order from the
actual messages that arrive. That’s why we have to flush the work queues
before we consider all the offers handled.

Regarding your comment for MLX CX-5, it is by design.The MLX device must
be logically hot removed and hot added back across a hibernate. There is
no guarantee the same accelerator device is present with the same
capabilities or device IDs or anything; the VHD could have been moved to
another machine with a different NIC, or to a VM without accelnet, or
whatever. This is all supported. So, it's not required to wait for the
MLX device to come back as part of hibernate resume.

Since Linux doesn't currently handle this correctly, the host works
around this by removing the MLX device before the guest hibernates. This
way, the guest does not expect an MLX device to still be present on
resume. This is inconvenient because it means the guest cannot initiate
hibernate on its own.

The behavior we want is for the guest to hot remove the MLX device
driver on resume, even if the MLX device was still present at suspend,
so that the host does not need this special pre-hibernate behavior. This
patch series may not be sufficient to ensure this, though. It just moves
things in the right direction, by handling the all-offers-delivered
message.

> 
>>   static void vmbus_onoffers_delivered(
>>   			struct vmbus_channel_message_header *hdr)
>>   {
>> +	complete(&vmbus_connection.all_offers_delivered_event);
>>   }
>>
>>   /*
>> @@ -1578,7 +1565,8 @@ void vmbus_onmessage(struct vmbus_channel_message_header *hdr)
>>   }
>>
>>   /*
>> - * vmbus_request_offers - Send a request to get all our pending offers.
>> + * vmbus_request_offers - Send a request to get all our pending offers
>> + * and wait for all offers to arrive.
>>    */
>>   int vmbus_request_offers(void)
>>   {
>> @@ -1596,6 +1584,10 @@ int vmbus_request_offers(void)
>>
>>   	msg->msgtype = CHANNELMSG_REQUESTOFFERS;
>>
>> +	/*
>> +	 * This REQUESTOFFERS message will result in the host sending an all
>> +	 * offers delivered message.
>> +	 */
>>   	ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_message_header),
>>   			     true);
>>
>> @@ -1607,6 +1599,22 @@ int vmbus_request_offers(void)
>>   		goto cleanup;
>>   	}
>>
>> +	/* Wait for the host to send all offers. */
>> +	while (wait_for_completion_timeout(
>> +		&vmbus_connection.all_offers_delivered_event, msecs_to_jiffies(10 * 1000)) == 0) {
> 
> Maybe do !wait_for_completion_timeout( ...) instead of explicitly testing
> for 0? I see that some existing code uses the explicit test for 0, but that's
> not the usual kernel code idiom.
> 
>> +		pr_warn("timed out waiting for all offers to be delivered...\n");
>> +	}
> 
> The while loop could continue forever. We don't want to trust the Hyper-V
> host to be well-behaved, so there should be an uber timeout where it gives
> up after 100 seconds (or pick some value). If the uber timeout is reached,
> I'm not sure if the code should panic or just go on -- it's debatable. But doing
> something explicit is probably better than repeatedly outputting the
> warning message forever.

You're right its debatable. It seems to be a best effort mechanism, and
we can't say for sure what all offers are going to be provided. So
printing a warning and moving ahead seems right to me. I'll remove the
loop, and keep it simple.

> 
>> +
>> +	/*
>> +	 * Flush handling of offer messages (which may initiate work on
>> +	 * other work queues).
>> +	 */
>> +	flush_workqueue(vmbus_connection.work_queue);
>> +
>> +	/* Flush processing the incoming offers. */
>> +	flush_workqueue(vmbus_connection.handle_primary_chan_wq);
>> +	flush_workqueue(vmbus_connection.handle_sub_chan_wq);
> 
> Why does the sub-channel workqueue need to be flushed? Sub-channels
> get created at the request of the Linux driver, in cooperation with the VSP
> on the Hyper-V side. If the top-level goal is for user-space drivers to be
> able to know that at least a core set of offers have been processed, it
> seems like waiting for sub-channel offers isn't necessary. And new
> subchannel offers could arrive immediately after this flush, so the flush
> doesn't provide any guarantee about whether there are offers in that
> workqueue. If there is a valid reason to wait, some explanatory
> comments would be helpful.

I'll either remove it, or put a comment to manage the expectations that
this is just for best effort and there is no guarantee that all sub
channel offers will be entertained by now.

> 
> Michael
> 
>> +
>>   cleanup:
>>   	kfree(msginfo);
>>
>> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
>> index f001ae880e1d..8351360bba16 100644
>> --- a/drivers/hv/connection.c
>> +++ b/drivers/hv/connection.c
>> @@ -34,8 +34,8 @@ struct vmbus_connection vmbus_connection = {
>>
>>   	.ready_for_suspend_event = COMPLETION_INITIALIZER(
>>   				  vmbus_connection.ready_for_suspend_event),
>> -	.ready_for_resume_event	= COMPLETION_INITIALIZER(
>> -				  vmbus_connection.ready_for_resume_event),
>> +	.all_offers_delivered_event = COMPLETION_INITIALIZER(
>> +				  vmbus_connection.all_offers_delivered_event),
>>   };
>>   EXPORT_SYMBOL_GPL(vmbus_connection);
>>
>> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
>> index d2856023d53c..80cc65dac740 100644
>> --- a/drivers/hv/hyperv_vmbus.h
>> +++ b/drivers/hv/hyperv_vmbus.h
>> @@ -287,18 +287,10 @@ struct vmbus_connection {
>>   	struct completion ready_for_suspend_event;
>>
>>   	/*
>> -	 * The number of primary channels that should be "fixed up"
>> -	 * upon resume: these channels are re-offered upon resume, and some
>> -	 * fields of the channel offers (i.e. child_relid and connection_id)
>> -	 * can change, so the old offermsg must be fixed up, before the resume
>> -	 * callbacks of the VSC drivers start to further touch the channels.
>> +	 * Completed once the host has offered all channels. Note that
>> +	 * some channels may still be being process on a work queue.
>>   	 */
>> -	atomic_t nr_chan_fixup_on_resume;
>> -	/*
>> -	 * vmbus_bus_resume() waits for "nr_chan_fixup_on_resume" to
>> -	 * drop to zero.
>> -	 */
>> -	struct completion ready_for_resume_event;
>> +	struct completion all_offers_delivered_event;
>>   };
>>
>>
>> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
>> index 9b15f7daf505..bd3fc41dc06b 100644
>> --- a/drivers/hv/vmbus_drv.c
>> +++ b/drivers/hv/vmbus_drv.c
>> @@ -2427,11 +2427,6 @@ static int vmbus_bus_suspend(struct device *dev)
>>   	if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
>>   		wait_for_completion(&vmbus_connection.ready_for_suspend_event);
>>
>> -	if (atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0) {
>> -		pr_err("Can not suspend due to a previous failed resuming\n");
>> -		return -EBUSY;
>> -	}
>> -
>>   	mutex_lock(&vmbus_connection.channel_mutex);
>>
>>   	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
>> @@ -2456,17 +2451,12 @@ static int vmbus_bus_suspend(struct device *dev)
>>   			pr_err("Sub-channel not deleted!\n");
>>   			WARN_ON_ONCE(1);
>>   		}
>> -
>> -		atomic_inc(&vmbus_connection.nr_chan_fixup_on_resume);
>>   	}
>>
>>   	mutex_unlock(&vmbus_connection.channel_mutex);
>>
>>   	vmbus_initiate_unload(false);
>>
>> -	/* Reset the event for the next resume. */
>> -	reinit_completion(&vmbus_connection.ready_for_resume_event);
>> -
>>   	return 0;
>>   }
>>
>> @@ -2502,14 +2492,8 @@ static int vmbus_bus_resume(struct device *dev)
>>   	if (ret != 0)
>>   		return ret;
>>
>> -	WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) == 0);
>> -
>>   	vmbus_request_offers();
>>
>> -	if (wait_for_completion_timeout(
>> -		&vmbus_connection.ready_for_resume_event, 10 * HZ) == 0)
>> -		pr_err("Some vmbus device is missing after suspending?\n");
>> -
>>   	/* Reset the event for the next suspend. */
>>   	reinit_completion(&vmbus_connection.ready_for_suspend_event);
>>
>> --
>> 2.34.1
>>
> 

Regards,
Naman


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot
  2024-10-21  4:57   ` Saurabh Singh Sengar
@ 2024-10-22  9:55     ` Naman Jain
  2024-10-22 17:38       ` Michael Kelley
  0 siblings, 1 reply; 17+ messages in thread
From: Naman Jain @ 2024-10-22  9:55 UTC (permalink / raw)
  To: Saurabh Singh Sengar
  Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	linux-hyperv, linux-kernel, John Starks, jacob.pan,
	Easwar Hariharan



On 10/21/2024 10:27 AM, Saurabh Singh Sengar wrote:
> On Fri, Oct 18, 2024 at 04:58:10AM -0700, Naman Jain wrote:
>> Channels offers are requested during vmbus initialization and resume
> 
> Nit: s/vmbus/VMBus

Thanks for reviewing Saurabh.

Noted this for next patch. Thanks

> 
>> from hibernation. Add support to wait for all channel offers to be
>> delivered and processed before returning from vmbus_request_offers.
>> This is to support user mode (VTL0) in OpenHCL (A Linux based
>> paravisor for Confidential VMs) to ensure that all channel offers
>> are present when init begins in VTL0, and it knows which channels
>> the host has offered and which it has not.
> 
> Usermode isn't necessarily of VTL0, and this issue was actually identified
> at a higher VTL in OpenHCL. However, this change isn't specific to OpenHCL,
> but is intended for general use. I would prefer if the commit message were
> either more generic or precisely aligned with the specific issue it's
> addressing.
> 

I'll make it generic.

>>
>> This is in analogy to a PCI bus not returning from probe until it has
>> scanned all devices on the bus.
>>
>> Without this, user mode can race with vmbus initialization and miss
>> channel offers. User mode has no way to work around this other than
>> sleeping for a while, since there is no way to know when vmbus has
>> finished processing offers.
>>
>> With this added functionality, remove earlier logic which keeps track
>> of count of offered channels post resume from hibernation. Once all
>> offers delivered message is received, no further offers are going to
>> be received. Consequently, logic to prevent suspend from happening
>> after previous resume had missing offers, is also removed.
>>
>> Co-developed-by: John Starks <jostarks@microsoft.com>
>> Signed-off-by: John Starks <jostarks@microsoft.com>
>> Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
>> ---
>>   drivers/hv/channel_mgmt.c | 38 +++++++++++++++++++++++---------------
>>   drivers/hv/connection.c   |  4 ++--
>>   drivers/hv/hyperv_vmbus.h | 14 +++-----------

<..>

>>   	}
>>   
>> +	/* Wait for the host to send all offers. */
>> +	while (wait_for_completion_timeout(
>> +		&vmbus_connection.all_offers_delivered_event, msecs_to_jiffies(10 * 1000)) == 0) {
> 
> Nit: Can simply put 10000 instead of 10*1000

Noted.

> 
>> +		pr_warn("timed out waiting for all offers to be delivered...\n");
> 
> I know we are moving from async to sync, so earlier we never checked this.
> But what if some channel timed out do we want to handle this case ? Or put
> a comment why this is OK.
> 
> We could set error from here as well, but I see vmbus_request_offers return value
> is never checked.

It seems to be a best effort way to get all the offers. If its not
received in time, I think we can print a warning and continue. I can add
that to a comment.

> 
>> +	}
>> +
>> +	/*
>> +	 * Flush handling of offer messages (which may initiate work on
>> +	 * other work queues).
>> +	 */

Regards,
Naman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot
  2024-10-22  9:55     ` Naman Jain
@ 2024-10-22 17:38       ` Michael Kelley
  2024-10-24 10:32         ` Naman Jain
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Kelley @ 2024-10-22 17:38 UTC (permalink / raw)
  To: Naman Jain, Saurabh Singh Sengar
  Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan

From: Naman Jain <namjain@linux.microsoft.com> Sent: Tuesday, October 22, 2024 2:55 AM
> 
> On 10/21/2024 10:27 AM, Saurabh Singh Sengar wrote:
> > On Fri, Oct 18, 2024 at 04:58:10AM -0700, Naman Jain wrote:

[snip]

> >>
> >> +	/* Wait for the host to send all offers. */
> >> +	while (wait_for_completion_timeout(
> >> +		&vmbus_connection.all_offers_delivered_event, msecs_to_jiffies(10 * 1000)) == 0) {
> >
> > Nit: Can simply put 10000 instead of 10*1000
> 
> Noted.
> 

If Easwar's patch can get "secs_to_jiffies()" added, that would be even better. :-)

Michael

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot
  2024-10-22  9:50     ` Naman Jain
@ 2024-10-22 18:03       ` Michael Kelley
  2024-10-25 18:18         ` Dexuan Cui
  2024-10-28  4:54         ` Naman Jain
  0 siblings, 2 replies; 17+ messages in thread
From: Michael Kelley @ 2024-10-22 18:03 UTC (permalink / raw)
  To: Naman Jain, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan

From: Naman Jain <namjain@linux.microsoft.com> Sent: Tuesday, October 22, 2024 2:50 AM
> 
> On 10/21/2024 10:03 AM, Michael Kelley wrote:
> > From: Naman Jain <namjain@linux.microsoft.com> Sent: Friday, October 18, 2024 4:58 AM

[snip]

> >> @@ -1297,12 +1285,11 @@
> >> EXPORT_SYMBOL_GPL(vmbus_hvsock_device_unregister);
> >>   /*
> >>    * vmbus_onoffers_delivered -
> >>    * This is invoked when all offers have been delivered.
> >> - *
> >> - * Nothing to do here.
> >>    */
> >
> > I'm unclear on the meaning of the ALLOFFERS_DELIVERED message. In the
> > early days of Hyper-V, the set of virtual devices in a VM was fixed before a
> > VM started, so presumably ALLOFFERS_DELIVERED meant that offers for
> > that fixed set of devices had been delivered. That meaning makes sense
> > conceptually.
> >
> > But more recent versions of Hyper-V support adding and removing devices
> > at any time during the life of the VM. If a device is added, a new offer is
> > generated. Existing devices (such as netvsc) can reconfigure their channels,
> > in which case new subchannel offers are generated. So the core concept of
> > "all offers delivered" isn't valid anymore.
> >
> > To date Linux hasn't done anything with this message, so we didn't need
> > precision about what it means. There's no VMBus specification or
> > documentation, so we need some text here in the comments that
> > explains precisely what ALLOFFERS_DELIVERED means, and how that
> > meaning aligns with the fact that offers can be delivered anytime during
> > the life of the VM.
> >
> > I'll note that in the past there's also been some behavior where during guest
> > boot, Hyper-V offers a virtual PCI device to the guest, immediately
> > rescinds the vPCI device (before Linux even finished processing the offer),
> > then offers it again. I wonder about how ALLOFFERS_DELIVERED interacts
> > with that situation.
> >
> > I ran some tests in an Azure L16s_v3 VM, which has three vPCI devices:
> > 2 NVMe devices and 1 Mellanox CX-5. The offers for the 2 NVMe devices
> > came before the ALLOFFERS_DELIVERED message, but the offer for
> > the Mellanox CX-5 came *after* the ALLOFFERS_DELIVERED message.
> > If that's the way the Mellanox CX-5 offers work, this patch breaks things
> > in the hibernation resume path because ALLOFFERS_DELIVERED isn't
> > sufficient to indicate that all primary channels have been re-offered
> > and the resume can proceed. All sub-channel offers came after the
> > ALLOFFERS_DELIVERED message, but that is what I expected and
> > shouldn't be a problem.
> >
> > It's hard for me to review this patch set without some precision around
> > what ALLOFFERS_DELIVERED means.
> 
> Thanks for your valuable comments.
> I checked this internally with John and here is the information you are
> looking for.
> 
> CHANNELMSG_ALLOFFERS_DELIVERED message arrives after all the boot-time
> offers are delivered. Other channels can be hot added later, even
> immediately after the all-offers-delivered message.
> 
> A boot-time offer will be any of the virtual hardware the VM is
> configured with at boot. The reason the Mellanox NIC is not included in
> this list is because the Mellanoc NIC is an optional accelerator to the
> synthetic vmbus NIC. It is hot added only after the vmbus NIC channel is
> opened (once we know the guest can support it, via the sriov bit in the
> netvsc protocol). So, it is not there at boot time, as we define it.

Good info about the meaning of ALLOFFERS_DELIVERED! Please
capture the info in comments in the code so it's available for future
readers.

> 
> The reason the logs show the ordering below is that the vmbus driver
> queues offer work to a work queue. The logs are out of order from the
> actual messages that arrive. That’s why we have to flush the work queues
> before we consider all the offers handled.

I'm not sure which logs are referenced in the above paragraph. When
I captured logs (which I didn't send), I did so in vmbus_on_msg_dpc(),
which is before the offer messages are put in the work queue. So my
logs reflected the order of the actual messages. But I agree that the
work queues need to be flushed.

> 
> Regarding your comment for MLX CX-5, it is by design.The MLX device must
> be logically hot removed and hot added back across a hibernate. There is
> no guarantee the same accelerator device is present with the same
> capabilities or device IDs or anything; the VHD could have been moved to
> another machine with a different NIC, or to a VM without accelnet, or
> whatever. This is all supported. So, it's not required to wait for the
> MLX device to come back as part of hibernate resume.
> 
> Since Linux doesn't currently handle this correctly, the host works
> around this by removing the MLX device before the guest hibernates. This
> way, the guest does not expect an MLX device to still be present on
> resume. This is inconvenient because it means the guest cannot initiate
> hibernate on its own.

I wasn't aware of the VF handling. Where does the guest notify the
host that it is planning to hibernate? I went looking for such code, but
couldn't immediately find it.  Is it in the netvsc driver? Is this the
sequence?

1) The guest notifies the host of the hibernate
2) The host sends a RESCIND_CHANNELOFFER message for each VF
    in the VM
3) The guest waits for all VF rescind processing to complete, and
    also must ensure that no new VFs get added in the meantime
4) Then the guest proceeds with the hibernation, knowing that there
    are no open channels for VF devices

> 
> The behavior we want is for the guest to hot remove the MLX device
> driver on resume, even if the MLX device was still present at suspend,
> so that the host does not need this special pre-hibernate behavior. This
> patch series may not be sufficient to ensure this, though. It just moves
> things in the right direction, by handling the all-offers-delivered
> message.

This aspect of the patch might be worth calling out in the commit
message.

> 
> >
> >>   static void vmbus_onoffers_delivered(
> >>   			struct vmbus_channel_message_header *hdr)
> >>   {
> >> +	complete(&vmbus_connection.all_offers_delivered_event);
> >>   }
> >>
> >>   /*
> >> @@ -1578,7 +1565,8 @@ void vmbus_onmessage(struct vmbus_channel_message_header *hdr)
> >>   }
> >>
> >>   /*
> >> - * vmbus_request_offers - Send a request to get all our pending offers.
> >> + * vmbus_request_offers - Send a request to get all our pending offers
> >> + * and wait for all offers to arrive.
> >>    */
> >>   int vmbus_request_offers(void)
> >>   {
> >> @@ -1596,6 +1584,10 @@ int vmbus_request_offers(void)
> >>
> >>   	msg->msgtype = CHANNELMSG_REQUESTOFFERS;
> >>
> >> +	/*
> >> +	 * This REQUESTOFFERS message will result in the host sending an all
> >> +	 * offers delivered message.
> >> +	 */
> >>   	ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_message_header),
> >>   			     true);
> >>
> >> @@ -1607,6 +1599,22 @@ int vmbus_request_offers(void)
> >>   		goto cleanup;
> >>   	}
> >>
> >> +	/* Wait for the host to send all offers. */
> >> +	while (wait_for_completion_timeout(
> >> +		&vmbus_connection.all_offers_delivered_event, msecs_to_jiffies(10 * 1000)) == 0) {
> >
> > Maybe do !wait_for_completion_timeout( ...) instead of explicitly testing
> > for 0? I see that some existing code uses the explicit test for 0, but that's
> > not the usual kernel code idiom.
> >
> >> +		pr_warn("timed out waiting for all offers to be delivered...\n");
> >> +	}
> >
> > The while loop could continue forever. We don't want to trust the Hyper-V
> > host to be well-behaved, so there should be an uber timeout where it gives
> > up after 100 seconds (or pick some value). If the uber timeout is reached,
> > I'm not sure if the code should panic or just go on -- it's debatable. But doing
> > something explicit is probably better than repeatedly outputting the
> > warning message forever.
> 
> You're right its debatable. It seems to be a best effort mechanism, and
> we can't say for sure what all offers are going to be provided. So
> printing a warning and moving ahead seems right to me. I'll remove the
> loop, and keep it simple.
> 
> >
> >> +
> >> +	/*
> >> +	 * Flush handling of offer messages (which may initiate work on
> >> +	 * other work queues).
> >> +	 */
> >> +	flush_workqueue(vmbus_connection.work_queue);
> >> +
> >> +	/* Flush processing the incoming offers. */
> >> +	flush_workqueue(vmbus_connection.handle_primary_chan_wq);
> >> +	flush_workqueue(vmbus_connection.handle_sub_chan_wq);
> >
> > Why does the sub-channel workqueue need to be flushed? Sub-channels
> > get created at the request of the Linux driver, in cooperation with the VSP
> > on the Hyper-V side. If the top-level goal is for user-space drivers to be
> > able to know that at least a core set of offers have been processed, it
> > seems like waiting for sub-channel offers isn't necessary. And new
> > subchannel offers could arrive immediately after this flush, so the flush
> > doesn't provide any guarantee about whether there are offers in that
> > workqueue. If there is a valid reason to wait, some explanatory
> > comments would be helpful.
> 
> I'll either remove it, or put a comment to manage the expectations that
> this is just for best effort and there is no guarantee that all sub
> channel offers will be entertained by now.
> 

I lean toward removing it, as flushing the sub channel work queue
doesn't seem related to the stated goals of the patch. It just leaves a
dangling "why is this flush being done?" question.

Michael

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot
  2024-10-22 17:38       ` Michael Kelley
@ 2024-10-24 10:32         ` Naman Jain
  0 siblings, 0 replies; 17+ messages in thread
From: Naman Jain @ 2024-10-24 10:32 UTC (permalink / raw)
  To: Michael Kelley, Saurabh Singh Sengar
  Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan



On 10/22/2024 11:08 PM, Michael Kelley wrote:
> From: Naman Jain <namjain@linux.microsoft.com> Sent: Tuesday, October 22, 2024 2:55 AM
>>
>> On 10/21/2024 10:27 AM, Saurabh Singh Sengar wrote:
>>> On Fri, Oct 18, 2024 at 04:58:10AM -0700, Naman Jain wrote:
> 
> [snip]
> 
>>>>
>>>> +	/* Wait for the host to send all offers. */
>>>> +	while (wait_for_completion_timeout(
>>>> +		&vmbus_connection.all_offers_delivered_event, msecs_to_jiffies(10 * 1000)) == 0) {
>>>
>>> Nit: Can simply put 10000 instead of 10*1000
>>
>> Noted.
>>
> 
> If Easwar's patch can get "secs_to_jiffies()" added, that would be even better. :-)
> 
> Michael

Yes, that's true :)

Regards,
Naman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot
  2024-10-22 18:03       ` Michael Kelley
@ 2024-10-25 18:18         ` Dexuan Cui
  2024-10-28 15:21           ` Michael Kelley
  2024-10-28  4:54         ` Naman Jain
  1 sibling, 1 reply; 17+ messages in thread
From: Dexuan Cui @ 2024-10-25 18:18 UTC (permalink / raw)
  To: Michael Kelley, Naman Jain, KY Srinivasan, Haiyang Zhang, Wei Liu
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan

> From: Michael Kelley <mhklinux@outlook.com>
> Sent: Tuesday, October 22, 2024 11:04 AM
> [...]
> I wasn't aware of the VF handling. Where does the guest notify the
> host that it is planning to hibernate? I went looking for such code, but
> couldn't immediately find it.  Is it in the netvsc driver? Is this the
> sequence?
> 
> 1) The guest notifies the host of the hibernate
> 2) The host sends a RESCIND_CHANNELOFFER message for each VF
>     in the VM
> 3) The guest waits for all VF rescind processing to complete, and
>     also must ensure that no new VFs get added in the meantime
> 4) Then the guest proceeds with the hibernation, knowing that there
>     are no open channels for VF devices

When a hibernated VM resumes on a different host, it looks like the host team
thinks that it's difficult to remember the VMBus device Instance GUID for the
VF, and use the same GUID on the new host. When the new host uses a new
Instance GUID for the VF, a Windows VM panics, and a Linux VM prints a
warning and IIRC loses the ability to hibernate again due to a check in the
VMBus driver.

So, as a workaround, the host team decides to remove the VF(s) before
asking the VM to hibernate. The sequence of a "host-initiated VM hibernation"
is:
1) a user clicks the "Hibernation" button on the portal (or uses the equivalent
cmd line or API).

2) Internally, the host temporarily disables AccelNet for the vNICs, i.e. sending
PCI_EJECT and RESCIND_CHANNELOFFER for each VF.

3) The guest responds accordingly, including sending PCI_EJECTION_COMPLETE
and CHANNELMSG_RELID_RELEASED.

4) Once the host knows that AccelNet has been disabled for the VM, the host
Sends a "please hibernate" message to the guest via the Shutdown IC.

5) The guest proceeds with the hibernation, knowing that there are no open
channels for VF devices and assuming no new VF will be offered during the
hibernation operation.

6) When the VM finishes hibernation and powers off, the host internally enables
AccelNet for the VM so that when the VM resumes on a new host, the new host
can offer a VF with a different VMBus device instance GUID.

The above is for a "host-initiated VM hibernation".

Currently, the Azure team doesn't support a "VM-initiated hibernation", where
the host has no opportunity to temporarily disable AccelNet. Maybe 
"VM-initiated hibernation" can be supported when MANA-Direct is used (i.e.
no more NetVSC NICs and there are only MANA VF NICs): in this scenario, I
suppose the host must remember the MANA VF's VMBus device Instance GUID
and use the same GUID on the new host.

> > The behavior we want is for the guest to hot remove the MLX device
> > driver on resume, even if the MLX device was still present at suspend,
> > so that the host does not need this special pre-hibernate behavior. This
> > patch series may not be sufficient to ensure this, though. It just moves
> > things in the right direction, by handling the all-offers-delivered
> > message.

I'm not sure if it's a good idea to add new code to try to remove an 
stale MLX VF since the scenario should not exist on Azure nowadays 
(currently the host temporarily disables AccelNet during hibernation so there
should be no stale MLX VF upon resume.)

On a local Hyper-V host, after a VM hibernates, we can manually disable
AccelNet (i.e. NIC SR-IOV) for the VM, and the VM will see a stale unresponsive
MLX VF upon resume. It would be tricky to clean up the VF gracefully:
we would have to wait for the resume callback in the Mellanox VF driver
to time out on the unresponsive VF (this can take 1 minute) and clean up the
related VMBus pass-through device backing the VF; what happens if a
host-initiated or VM-initiated hibernation is triggered during the 1 minute?
I suspect there may be some tricky race condition issues, e.g. we may 
need to figure out how to synchronize the .resume with the .remove callbacks
of the MLX driver.

I think the general assumption is that the VM's configuration should not
change at all across hibernation, but it looks like this assumption is found
to be false under some conditions from time to time... I wish the assumption
can be always true with OpenHCL.

Thanks,
Dexuan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot
  2024-10-22 18:03       ` Michael Kelley
  2024-10-25 18:18         ` Dexuan Cui
@ 2024-10-28  4:54         ` Naman Jain
  1 sibling, 0 replies; 17+ messages in thread
From: Naman Jain @ 2024-10-28  4:54 UTC (permalink / raw)
  To: Michael Kelley, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan



On 10/22/2024 11:33 PM, Michael Kelley wrote:
> From: Naman Jain <namjain@linux.microsoft.com> Sent: Tuesday, October 22, 2024 2:50 AM
>>
>> On 10/21/2024 10:03 AM, Michael Kelley wrote:
>>> From: Naman Jain <namjain@linux.microsoft.com> Sent: Friday, October 18, 2024 4:58 AM
> 
> [snip]
> 
>>>> @@ -1297,12 +1285,11 @@
>>>> EXPORT_SYMBOL_GPL(vmbus_hvsock_device_unregister);
>>>>    /*
>>>>     * vmbus_onoffers_delivered -
>>>>     * This is invoked when all offers have been delivered.
>>>> - *
>>>> - * Nothing to do here.
>>>>     */
>>>
>>> I'm unclear on the meaning of the ALLOFFERS_DELIVERED message. In the
>>> early days of Hyper-V, the set of virtual devices in a VM was fixed before a
>>> VM started, so presumably ALLOFFERS_DELIVERED meant that offers for
>>> that fixed set of devices had been delivered. That meaning makes sense
>>> conceptually.
>>>
>>> But more recent versions of Hyper-V support adding and removing devices
>>> at any time during the life of the VM. If a device is added, a new offer is
>>> generated. Existing devices (such as netvsc) can reconfigure their channels,
>>> in which case new subchannel offers are generated. So the core concept of
>>> "all offers delivered" isn't valid anymore.
>>>
>>> To date Linux hasn't done anything with this message, so we didn't need
>>> precision about what it means. There's no VMBus specification or
>>> documentation, so we need some text here in the comments that
>>> explains precisely what ALLOFFERS_DELIVERED means, and how that
>>> meaning aligns with the fact that offers can be delivered anytime during
>>> the life of the VM.
>>>
>>> I'll note that in the past there's also been some behavior where during guest
>>> boot, Hyper-V offers a virtual PCI device to the guest, immediately
>>> rescinds the vPCI device (before Linux even finished processing the offer),
>>> then offers it again. I wonder about how ALLOFFERS_DELIVERED interacts
>>> with that situation.
>>>
>>> I ran some tests in an Azure L16s_v3 VM, which has three vPCI devices:
>>> 2 NVMe devices and 1 Mellanox CX-5. The offers for the 2 NVMe devices
>>> came before the ALLOFFERS_DELIVERED message, but the offer for
>>> the Mellanox CX-5 came *after* the ALLOFFERS_DELIVERED message.
>>> If that's the way the Mellanox CX-5 offers work, this patch breaks things
>>> in the hibernation resume path because ALLOFFERS_DELIVERED isn't
>>> sufficient to indicate that all primary channels have been re-offered
>>> and the resume can proceed. All sub-channel offers came after the
>>> ALLOFFERS_DELIVERED message, but that is what I expected and
>>> shouldn't be a problem.
>>>
>>> It's hard for me to review this patch set without some precision around
>>> what ALLOFFERS_DELIVERED means.
>>
>> Thanks for your valuable comments.
>> I checked this internally with John and here is the information you are
>> looking for.
>>
>> CHANNELMSG_ALLOFFERS_DELIVERED message arrives after all the boot-time
>> offers are delivered. Other channels can be hot added later, even
>> immediately after the all-offers-delivered message.
>>
>> A boot-time offer will be any of the virtual hardware the VM is
>> configured with at boot. The reason the Mellanox NIC is not included in
>> this list is because the Mellanoc NIC is an optional accelerator to the
>> synthetic vmbus NIC. It is hot added only after the vmbus NIC channel is
>> opened (once we know the guest can support it, via the sriov bit in the
>> netvsc protocol). So, it is not there at boot time, as we define it.
> 
> Good info about the meaning of ALLOFFERS_DELIVERED! Please
> capture the info in comments in the code so it's available for future
> readers.

Noted.

> 
>>
>> The reason the logs show the ordering below is that the vmbus driver
>> queues offer work to a work queue. The logs are out of order from the
>> actual messages that arrive. That’s why we have to flush the work queues
>> before we consider all the offers handled.
> 
> I'm not sure which logs are referenced in the above paragraph. When
> I captured logs (which I didn't send), I did so in vmbus_on_msg_dpc(),
> which is before the offer messages are put in the work queue. So my
> logs reflected the order of the actual messages. But I agree that the
> work queues need to be flushed.
> 
>>

My bad.
The logs, which I referred to here, was actually ftrace logs, for 
different channels offer, request for offer and all offers delivered 
event. My observation was then based on how these works are processed in 
workqueues. I was observing that all offers delivered event was being 
processed before channel offers. This was misleading, and sorry for the 
confusion.

>> Regarding your comment for MLX CX-5, it is by design.The MLX device must
>> be logically hot removed and hot added back across a hibernate. There is
>> no guarantee the same accelerator device is present with the same
>> capabilities or device IDs or anything; the VHD could have been moved to
>> another machine with a different NIC, or to a VM without accelnet, or
>> whatever. This is all supported. So, it's not required to wait for the
>> MLX device to come back as part of hibernate resume.
>>
>> Since Linux doesn't currently handle this correctly, the host works
>> around this by removing the MLX device before the guest hibernates. This
>> way, the guest does not expect an MLX device to still be present on
>> resume. This is inconvenient because it means the guest cannot initiate
>> hibernate on its own.
> 
> I wasn't aware of the VF handling. Where does the guest notify the
> host that it is planning to hibernate? I went looking for such code, but
> couldn't immediately find it.  Is it in the netvsc driver? Is this the
> sequence?
> 
> 1) The guest notifies the host of the hibernate
> 2) The host sends a RESCIND_CHANNELOFFER message for each VF
>      in the VM
> 3) The guest waits for all VF rescind processing to complete, and
>      also must ensure that no new VFs get added in the meantime
> 4) Then the guest proceeds with the hibernation, knowing that there
>      are no open channels for VF devices
> 

Dexuan has responded to this query.

>>
>> The behavior we want is for the guest to hot remove the MLX device
>> driver on resume, even if the MLX device was still present at suspend,
>> so that the host does not need this special pre-hibernate behavior. This
>> patch series may not be sufficient to ensure this, though. It just moves
>> things in the right direction, by handling the all-offers-delivered
>> message.
> 
> This aspect of the patch might be worth calling out in the commit
> message.

Will do.

> 
>>
>>>
>>>>    static void vmbus_onoffers_delivered(
>>>>    			struct vmbus_channel_message_header *hdr)
>>>>    {
>>>> +	complete(&vmbus_connection.all_offers_delivered_event);
>>>>    }
>>>>
>>>>    /*
>>>> @@ -1578,7 +1565,8 @@ void vmbus_onmessage(struct vmbus_channel_message_header *hdr)
>>>>    }
>>>>
>>>>    /*
>>>> - * vmbus_request_offers - Send a request to get all our pending offers.
>>>> + * vmbus_request_offers - Send a request to get all our pending offers
>>>> + * and wait for all offers to arrive.
>>>>     */
>>>>    int vmbus_request_offers(void)
>>>>    {
>>>> @@ -1596,6 +1584,10 @@ int vmbus_request_offers(void)
>>>>
>>>>    	msg->msgtype = CHANNELMSG_REQUESTOFFERS;
>>>>
>>>> +	/*
>>>> +	 * This REQUESTOFFERS message will result in the host sending an all
>>>> +	 * offers delivered message.
>>>> +	 */
>>>>    	ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_message_header),
>>>>    			     true);
>>>>
>>>> @@ -1607,6 +1599,22 @@ int vmbus_request_offers(void)
>>>>    		goto cleanup;
>>>>    	}
>>>>
>>>> +	/* Wait for the host to send all offers. */
>>>> +	while (wait_for_completion_timeout(
>>>> +		&vmbus_connection.all_offers_delivered_event, msecs_to_jiffies(10 * 1000)) == 0) {
>>>
>>> Maybe do !wait_for_completion_timeout( ...) instead of explicitly testing
>>> for 0? I see that some existing code uses the explicit test for 0, but that's
>>> not the usual kernel code idiom.
>>>
>>>> +		pr_warn("timed out waiting for all offers to be delivered...\n");
>>>> +	}
>>>
>>> The while loop could continue forever. We don't want to trust the Hyper-V
>>> host to be well-behaved, so there should be an uber timeout where it gives
>>> up after 100 seconds (or pick some value). If the uber timeout is reached,
>>> I'm not sure if the code should panic or just go on -- it's debatable. But doing
>>> something explicit is probably better than repeatedly outputting the
>>> warning message forever.
>>
>> You're right its debatable. It seems to be a best effort mechanism, and
>> we can't say for sure what all offers are going to be provided. So
>> printing a warning and moving ahead seems right to me. I'll remove the
>> loop, and keep it simple.
>>
>>>
>>>> +
>>>> +	/*
>>>> +	 * Flush handling of offer messages (which may initiate work on
>>>> +	 * other work queues).
>>>> +	 */
>>>> +	flush_workqueue(vmbus_connection.work_queue);
>>>> +
>>>> +	/* Flush processing the incoming offers. */
>>>> +	flush_workqueue(vmbus_connection.handle_primary_chan_wq);
>>>> +	flush_workqueue(vmbus_connection.handle_sub_chan_wq);
>>>
>>> Why does the sub-channel workqueue need to be flushed? Sub-channels
>>> get created at the request of the Linux driver, in cooperation with the VSP
>>> on the Hyper-V side. If the top-level goal is for user-space drivers to be
>>> able to know that at least a core set of offers have been processed, it
>>> seems like waiting for sub-channel offers isn't necessary. And new
>>> subchannel offers could arrive immediately after this flush, so the flush
>>> doesn't provide any guarantee about whether there are offers in that
>>> workqueue. If there is a valid reason to wait, some explanatory
>>> comments would be helpful.
>>
>> I'll either remove it, or put a comment to manage the expectations that
>> this is just for best effort and there is no guarantee that all sub
>> channel offers will be entertained by now.
>>
> 
> I lean toward removing it, as flushing the sub channel work queue
> doesn't seem related to the stated goals of the patch. It just leaves a
> dangling "why is this flush being done?" question.

Understood, will remove it. Thanks.

> 
> Michael

Regards,
Naman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot
  2024-10-25 18:18         ` Dexuan Cui
@ 2024-10-28 15:21           ` Michael Kelley
  0 siblings, 0 replies; 17+ messages in thread
From: Michael Kelley @ 2024-10-28 15:21 UTC (permalink / raw)
  To: Dexuan Cui, Naman Jain, KY Srinivasan, Haiyang Zhang, Wei Liu
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	John Starks, jacob.pan@linux.microsoft.com, Easwar Hariharan

From: Dexuan Cui <decui@microsoft.com> Sent: Friday, October 25, 2024 11:19 AM
> 
> > From: Michael Kelley <mhklinux@outlook.com>
> > Sent: Tuesday, October 22, 2024 11:04 AM
> > [...]
> > I wasn't aware of the VF handling. Where does the guest notify the
> > host that it is planning to hibernate? I went looking for such code, but
> > couldn't immediately find it.  Is it in the netvsc driver? Is this the
> > sequence?
> >
> > 1) The guest notifies the host of the hibernate
> > 2) The host sends a RESCIND_CHANNELOFFER message for each VF
> >     in the VM
> > 3) The guest waits for all VF rescind processing to complete, and
> >     also must ensure that no new VFs get added in the meantime
> > 4) Then the guest proceeds with the hibernation, knowing that there
> >     are no open channels for VF devices
> 
> When a hibernated VM resumes on a different host, it looks like the host team
> thinks that it's difficult to remember the VMBus device Instance GUID for the
> VF, and use the same GUID on the new host. When the new host uses a new
> Instance GUID for the VF, a Windows VM panics, and a Linux VM prints a
> warning and IIRC loses the ability to hibernate again due to a check in the
> VMBus driver.
> 
> So, as a workaround, the host team decides to remove the VF(s) before
> asking the VM to hibernate. The sequence of a "host-initiated VM hibernation"
> is:
> 1) a user clicks the "Hibernation" button on the portal (or uses the equivalent
> cmd line or API).
> 
> 2) Internally, the host temporarily disables AccelNet for the vNICs, i.e. sending
> PCI_EJECT and RESCIND_CHANNELOFFER for each VF.
> 
> 3) The guest responds accordingly, including sending PCI_EJECTION_COMPLETE
> and CHANNELMSG_RELID_RELEASED.
> 
> 4) Once the host knows that AccelNet has been disabled for the VM, the host
> Sends a "please hibernate" message to the guest via the Shutdown IC.
> 
> 5) The guest proceeds with the hibernation, knowing that there are no open
> channels for VF devices and assuming no new VF will be offered during the
> hibernation operation.
> 
> 6) When the VM finishes hibernation and powers off, the host internally enables
> AccelNet for the VM so that when the VM resumes on a new host, the new host
> can offer a VF with a different VMBus device instance GUID.
> 
> The above is for a "host-initiated VM hibernation".
> 
> Currently, the Azure team doesn't support a "VM-initiated hibernation", where
> the host has no opportunity to temporarily disable AccelNet. Maybe
> "VM-initiated hibernation" can be supported when MANA-Direct is used (i.e.
> no more NetVSC NICs and there are only MANA VF NICs): in this scenario, I
> suppose the host must remember the MANA VF's VMBus device Instance GUID
> and use the same GUID on the new host.
> 

Thanks for the information, Dexuan! I'm thinking about hibernation
a bit more, and perhaps will write a Linux kernel documentation topic
under Documentation/virt/hyperv that covers the full set of scenarios.
The Hyper-V interactions and assumptions are more complex than I
had realized. Getting them formally documented should be helpful in
the long run.

Michael 

> > > The behavior we want is for the guest to hot remove the MLX device
> > > driver on resume, even if the MLX device was still present at suspend,
> > > so that the host does not need this special pre-hibernate behavior. This
> > > patch series may not be sufficient to ensure this, though. It just moves
> > > things in the right direction, by handling the all-offers-delivered
> > > message.
> 
> I'm not sure if it's a good idea to add new code to try to remove an
> stale MLX VF since the scenario should not exist on Azure nowadays
> (currently the host temporarily disables AccelNet during hibernation so there
> should be no stale MLX VF upon resume.)
> 
> On a local Hyper-V host, after a VM hibernates, we can manually disable
> AccelNet (i.e. NIC SR-IOV) for the VM, and the VM will see a stale unresponsive
> MLX VF upon resume. It would be tricky to clean up the VF gracefully:
> we would have to wait for the resume callback in the Mellanox VF driver
> to time out on the unresponsive VF (this can take 1 minute) and clean up the
> related VMBus pass-through device backing the VF; what happens if a
> host-initiated or VM-initiated hibernation is triggered during the 1 minute?
> I suspect there may be some tricky race condition issues, e.g. we may
> need to figure out how to synchronize the .resume with the .remove callbacks
> of the MLX driver.
> 
> I think the general assumption is that the VM's configuration should not
> change at all across hibernation, but it looks like this assumption is found
> to be false under some conditions from time to time... I wish the assumption
> can be always true with OpenHCL.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-10-28 15:21 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-18 11:58 [PATCH 0/2] Drivers: hv: vmbus: Wait for offers and log missing offers Naman Jain
2024-10-18 11:58 ` [PATCH 1/2] Drivers: hv: vmbus: Wait for offers during boot Naman Jain
2024-10-18 22:52   ` Easwar Hariharan
2024-10-21  4:33   ` Michael Kelley
2024-10-22  9:50     ` Naman Jain
2024-10-22 18:03       ` Michael Kelley
2024-10-25 18:18         ` Dexuan Cui
2024-10-28 15:21           ` Michael Kelley
2024-10-28  4:54         ` Naman Jain
2024-10-21  4:57   ` Saurabh Singh Sengar
2024-10-22  9:55     ` Naman Jain
2024-10-22 17:38       ` Michael Kelley
2024-10-24 10:32         ` Naman Jain
2024-10-18 11:58 ` [PATCH 2/2] Drivers: hv: vmbus: Log on missing offers Naman Jain
2024-10-21  2:43   ` Easwar Hariharan
2024-10-21  4:26   ` Saurabh Singh Sengar
2024-10-21  4:31     ` Naman Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).