Linux-HyperV List

Linux-HyperV List
 help / color / mirror / Atom feed

* [PATCH 1/1] mshv: Add conditional VMBus dependency
From: Michael Kelley @ 2026-05-21 16:49 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, jloeser, linux-hyperv
  Cc: linux-kernel, arnd, hamzamahfooz

From: Michael Kelley <mhklinux@outlook.com>

When the VMBus driver is not part of the kernel (CONFIG_HYPERV_VMBUS=n),
the MSHV root driver fails to link:

ERROR: modpost: "hv_vmbus_exists" [drivers/hv/mshv_root.ko] undefined!

Fix this while meeting these requirements:
* It must be possible to include the MSHV root driver without the
  VMBus driver. In such case, the MSHV root driver can be built-in
  to the kernel image, or it can be built as a separate module.
* If both the MSHV root driver and the VMBus driver are present, the
  MSHV root driver and VMBus driver can both be built-in, or they can
  both be separate modules. Or the MSHV root driver can be a module
  while the VMBus driver can be built-in, but the reverse is
  disallowed. Regardless of the build choices, the VMBus driver must
  be loaded before the MSHV driver in order for the SynIC to be
  managed properly (see comments in the MSHV SynIC code).

The fix has two parts:
* Add a Kconfig entry for MSHV_ROOT to depend on HYPERV_VMBUS if
  HYPERV_VMBUS is present. The entry disallows MSHV_ROOT being
  built-in when HYPERV_VMBUS is a module, but without requiring that
  HYPERV_VMBUS be built.
* Add #ifdefs around MSHV SynIC calls to hv_vmbus_exists(). When
  the VMBus driver is present, these calls establish a module
  dependency to ensure that the VMBus driver loads first when both
  are built as modules. But if the VMBus driver is not present,
  the behavior is as if hv_vmbus_exists() returned "false", and
  there is no module dependency.

Existing code ensures that the VMBus driver loads first if it is
built-in. The VMBus driver uses subsys_initcall(), which is
initcall level 4. The MSHV root driver uses module_init(), which
becomes device_init() when built-in, and device_init() is
initcall level 6.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Closes: https://lore.kernel.org/all/20260520074044.923728-1-arnd@kernel.org/
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
 drivers/hv/Kconfig      |  1 +
 drivers/hv/mshv_synic.c | 12 ++++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 2d0b3fcb0ff8..aa11bcefddf2 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -74,6 +74,7 @@ config MSHV_ROOT
 	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
 	# no particular order, making it impossible to reassemble larger pages
 	depends on PAGE_SIZE_4KB
+	depends on HYPERV_VMBUS if HYPERV_VMBUS
 	select EVENTFD
 	select VIRT_XFER_TO_GUEST_WORK
 	select HMM_MIRROR
diff --git a/drivers/hv/mshv_synic.c b/drivers/hv/mshv_synic.c
index 88170ce6b83f..3f72a3dd232d 100644
--- a/drivers/hv/mshv_synic.c
+++ b/drivers/hv/mshv_synic.c
@@ -463,11 +463,15 @@ static int mshv_synic_cpu_init(unsigned int cpu)
 			&spages->synic_event_flags_page;
 	struct hv_synic_event_ring_page **event_ring_page =
 			&spages->synic_event_ring_page;
+	bool vmbus_active = false;
+
 	/*
 	 * VMBus owns SIMP/SIEFP/SCONTROL when it is active.
 	 * See hv_hyp_synic_enable_regs() for that initialization.
 	 */
-	bool vmbus_active = hv_vmbus_exists();
+#if IS_ENABLED(CONFIG_HYPERV_VMBUS)
+	vmbus_active = hv_vmbus_exists();
+#endif

 	/*
 	 * Map the SYNIC message page. When VMBus is not active the
@@ -587,8 +591,12 @@ static int mshv_synic_cpu_exit(unsigned int cpu)
 		&spages->synic_event_flags_page;
 	struct hv_synic_event_ring_page **event_ring_page =
 		&spages->synic_event_ring_page;
+	bool vmbus_active = false;
+
 	/* VMBus owns SIMP/SIEFP/SCONTROL when it is active */
-	bool vmbus_active = hv_vmbus_exists();
+#if IS_ENABLED(CONFIG_HYPERV_VMBUS)
+	vmbus_active = hv_vmbus_exists();
+#endif

 	/* Disable the interrupt */
 	sint.as_uint64 = hv_get_non_nested_msr(HV_MSR_SINT0 + HV_SYNIC_INTERCEPTION_SINT_INDEX);
-- 
2.25.1

^ permalink raw reply related

* RE: [PATCH v3 1/2] drm/hyperv: validate resolution_count and fix WIN8 fallback
From: Michael Kelley @ 2026-05-21 17:07 UTC (permalink / raw)
  To: Berkant Koc, Saurabh Sengar, Dexuan Cui, Long Li
  Cc: linux-hyperv@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Michael Kelley, Thomas Zimmermann, Maarten Lankhorst,
	Maxime Ripard, Deepak Rawat
In-Reply-To: <1b88bc7edeb2f0153475225b67f19aaca629eca8.1779221799.git.me@berkoc.com>

From: Berkant Koc <me@berkoc.com> Sent: Tuesday, May 19, 2026 1:08 PM
> 
> A SYNTHVID_RESOLUTION_RESPONSE with resolution_count > 64 walks past
> the supported_resolution[SYNTHVID_MAX_RESOLUTION_COUNT] array in the
> parse loop. Bound resolution_count against the array size, folded
> into the existing zero-check.
> 
> When the WIN10 resolution probe fails, the caller in
> hyperv_connect_vsp() left hv->screen_*_max / preferred_* unpopulated,
> which sets mode_config.max_width / max_height to 0 and makes
> drm_internal_framebuffer_create() reject every userspace framebuffer
> with -EINVAL. The pre-WIN10 branch had the same gap for
> preferred_width / preferred_height. Use a single post-probe fallback
> guarded by screen_width_max == 0 so both paths converge on the WIN8
> defaults.
> 
> Signed-off-by: Berkant Koc <me@berkoc.com>
> Assisted-by: Claude:claude-opus-4-7 berkoc-pipeline
> Fixes: 76c56a5affeb ("drm/hyperv: Add DRM driver for hyperv synthetic video device")
> Cc: stable@vger.kernel.org # 5.14+
> ---
>  drivers/gpu/drm/hyperv/hyperv_drm_proto.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> index 051ecc526..c3d0ff229 100644
> --- a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> +++ b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> @@ -391,8 +391,11 @@ static int hyperv_get_supported_resolution(struct hv_device
> *hdev)
>  		return -ETIMEDOUT;
>  	}
> 
> -	if (msg->resolution_resp.resolution_count == 0) {
> -		drm_err(dev, "No supported resolutions\n");
> +	if (msg->resolution_resp.resolution_count == 0 ||
> +	    msg->resolution_resp.resolution_count >
> +	    SYNTHVID_MAX_RESOLUTION_COUNT) {
> +		drm_err(dev, "Invalid resolution count: %d\n",
> +			msg->resolution_resp.resolution_count);
>  		return -ENODEV;
>  	}
> 
> @@ -508,9 +511,13 @@ int hyperv_connect_vsp(struct hv_device *hdev)
>  		ret = hyperv_get_supported_resolution(hdev);
>  		if (ret)
>  			drm_err(dev, "Failed to get supported resolution from host, use default\n");
> -	} else {
> +	}
> +
> +	if (!hv->screen_width_max) {
>  		hv->screen_width_max = SYNTHVID_WIDTH_WIN8;
>  		hv->screen_height_max = SYNTHVID_HEIGHT_WIN8;
> +		hv->preferred_width = SYNTHVID_WIDTH_WIN8;
> +		hv->preferred_height = SYNTHVID_HEIGHT_WIN8;
>  	}
> 
>  	hv->mmio_megabytes = hdev->channel->offermsg.offer.mmio_megabytes;
> --
> 2.47.3
> 

Looks good to me.

Reviewed-by: Michael Kelley <mhklinux@outlook.com>

^ permalink raw reply

* RE: [PATCH v3 2/2] drm/hyperv: validate VMBus packet size in receive callback
From: Michael Kelley @ 2026-05-21 17:19 UTC (permalink / raw)
  To: Berkant Koc, Saurabh Sengar, Dexuan Cui, Long Li
  Cc: linux-hyperv@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Michael Kelley, Thomas Zimmermann, Maarten Lankhorst,
	Maxime Ripard, Deepak Rawat
In-Reply-To: <e6e63276cca2901641ab39029e4fd3d621b1ee92.1779221799.git.me@berkoc.com>

From: Berkant Koc <me@berkoc.com> Sent: Tuesday, May 19, 2026 1:09 PM
> 
> hyperv_receive_sub() reads msg->vid_hdr.type and dispatches into one
> of four message-type branches without knowing how many bytes the host
> wrote into hv->recv_buf. The completion path then runs
> memcpy(hv->init_buf, msg, VMBUS_MAX_PACKET_SIZE), so the consumer
> that wakes on wait_for_completion_timeout() can read up to 16 KiB of
> residue from a prior message as if it were the response payload.
> 
> Pass bytes_recvd into hyperv_receive_sub() and reject any packet that
> does not cover the pipe + synthvid header. For each of the three
> completion-driving types (SYNTHVID_VERSION_RESPONSE,
> SYNTHVID_RESOLUTION_RESPONSE, SYNTHVID_VRAM_LOCATION_ACK) also
> require the type-specific payload before memcpy/complete, and apply
> the same rule to SYNTHVID_FEATURE_CHANGE before reading is_dirt_needed.
> The memcpy then uses bytes_recvd, which is bounded by
> VMBUS_MAX_PACKET_SIZE through the call to vmbus_recvpacket().
> 
> Rejected packets are reported via drm_err_ratelimited() rather than
> silently dropped, matching the CoCo-hardened pattern in
> hv_kvp_onchannelcallback().

We discussed several issues with this patch in the feedback
from Sashiko. But see one more issue below.

> 
> Fixes: 76c56a5affeb ("drm/hyperv: Add DRM driver for hyperv synthetic video device")
> Cc: stable@vger.kernel.org # 5.14+
> Signed-off-by: Berkant Koc <me@berkoc.com>
> Assisted-by: Claude:claude-opus-4-7 berkoc-pipeline
> ---
>  drivers/gpu/drm/hyperv/hyperv_drm_proto.c | 42 +++++++++++++++++++++--
>  1 file changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> index c3d0ff229..12d3feb4f 100644
> --- a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> +++ b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> @@ -420,26 +420,62 @@ static int hyperv_get_supported_resolution(struct hv_device *hdev)
>  	return 0;
>  }
> 
> -static void hyperv_receive_sub(struct hv_device *hdev)
> +static void hyperv_receive_sub(struct hv_device *hdev, u32 bytes_recvd)
>  {
>  	struct hyperv_drm_device *hv = hv_get_drvdata(hdev);
>  	struct synthvid_msg *msg;
> +	size_t hdr_size;
> 
>  	if (!hv)
>  		return;
> 
> +	hdr_size = sizeof(struct pipe_msg_hdr) +
> +		   sizeof(struct synthvid_msg_hdr);
> +	if (bytes_recvd < hdr_size) {
> +		drm_err_ratelimited(&hv->dev,
> +				    "synthvid packet too small for header: %u\n",
> +				    bytes_recvd);
> +		return;
> +	}
> +
>  	msg = (struct synthvid_msg *)hv->recv_buf;
> 
>  	/* Complete the wait event */
>  	if (msg->vid_hdr.type == SYNTHVID_VERSION_RESPONSE ||
>  	    msg->vid_hdr.type == SYNTHVID_RESOLUTION_RESPONSE ||
>  	    msg->vid_hdr.type == SYNTHVID_VRAM_LOCATION_ACK) {
> -		memcpy(hv->init_buf, msg, VMBUS_MAX_PACKET_SIZE);
> +		size_t need = hdr_size;
> +
> +		switch (msg->vid_hdr.type) {
> +		case SYNTHVID_VERSION_RESPONSE:
> +			need += sizeof(struct synthvid_version_resp);
> +			break;
> +		case SYNTHVID_RESOLUTION_RESPONSE:
> +			need += sizeof(struct synthvid_supported_resolution_resp);

I'm concerned that this might be too aggressive.  The last element
of struct synthvid_supported_resolution_resp is an array, and there's
a count in the message describing how many elements of the array
are populated. But Hyper-V may not (and probably doesn't) include
unpopulated elements in the response message.  So "need" is likely
calculated as too large. Are you able to test this in a Hyper-V VM to
confirm?

I think you'll find it necessary to first check that enough bytes
have arrived to read the "resolution_count" field, and then use
that value to calculate "need".  There are several other places
in hardened VMBus drivers that use that same two-level
technique. It's a pain, but there's not really any alternative.

Michael

> +			break;
> +		case SYNTHVID_VRAM_LOCATION_ACK:
> +			need += sizeof(struct synthvid_vram_location_ack);
> +			break;
> +		}
> +		if (bytes_recvd < need) {
> +			drm_err_ratelimited(&hv->dev,
> +					    "synthvid packet too small for type %u: %u < %zu\n",
> +					    msg->vid_hdr.type, bytes_recvd, need);
> +			return;
> +		}
> +		memcpy(hv->init_buf, msg, bytes_recvd);
>  		complete(&hv->wait);
>  		return;
>  	}
> 
>  	if (msg->vid_hdr.type == SYNTHVID_FEATURE_CHANGE) {
> +		if (bytes_recvd < hdr_size +
> +		    sizeof(struct synthvid_feature_change)) {
> +			drm_err_ratelimited(&hv->dev,
> +					    "synthvid feature change packet too small: %u\n",
> +					    bytes_recvd);
> +			return;
> +		}
>  		hv->dirt_needed = msg->feature_chg.is_dirt_needed;
>  		if (hv->dirt_needed)
>  			hyperv_hide_hw_ptr(hv->hdev);
> @@ -466,7 +502,7 @@ static void hyperv_receive(void *ctx)
>  				       &bytes_recvd, &req_id);
>  		if (bytes_recvd > 0 &&
>  		    recv_buf->pipe_hdr.type == PIPE_MSG_DATA)
> -			hyperv_receive_sub(hdev);
> +			hyperv_receive_sub(hdev, bytes_recvd);
>  	} while (bytes_recvd > 0 && ret == 0);
>  }
> 
> --
> 2.47.3
> 


^ permalink raw reply

* [PATCH 1/1] x86/hyperv: Refactor hv_smp_prepare_cpus()
From: Michael Kelley @ 2026-05-21 19:23 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, tglx, mingo, bp,
	dave.hansen, x86, hpa, linux-hyperv, linux-kernel, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

hv_smp_prepare_cpus() current handles two disjoint cases: a fully
enlightened SNP guest and running in the root partition. The root
partition case has recently added more steps, and as a result the
function is getting somewhat messy.

Refactor the code by putting the SNP and root cases into separate
functions. For the root case, move most of the code into hv_proc.c,
which is built only when MSHV_ROOT is configured. The move reduces
the surface area between the main code and the root partition
extensions. Several stubs go away, with an overall modest reduction
in lines of code.

No functional change.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
 arch/x86/kernel/cpu/mshyperv.c | 52 +++++++---------------------------
 drivers/hv/hv_proc.c           | 35 +++++++++++++++++++++--
 include/asm-generic/mshyperv.h | 17 ++---------
 3 files changed, 45 insertions(+), 59 deletions(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 640e6b223c2d..442156056cd2 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -32,7 +32,6 @@
 #include <asm/msr.h>
 #include <asm/nmi.h>
 #include <clocksource/hyperv_timer.h>
-#include <asm/numa.h>
 #include <asm/svm.h>
 
 /* Is Linux running on nested Microsoft Hypervisor */
@@ -413,46 +412,16 @@ static void __init hv_smp_prepare_boot_cpu(void)
 #endif
 }
 
-static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
+static void __init hv_smp_prepare_cpus_for_snp(unsigned int max_cpus)
 {
-#ifdef CONFIG_X86_64
-	int i;
-	int ret;
-#endif
-
 	native_smp_prepare_cpus(max_cpus);
+	apic->wakeup_secondary_cpu_64 = hv_snp_boot_ap;
+}
 
-	/*
-	 *  Override wakeup_secondary_cpu_64 callback for SEV-SNP
-	 *  enlightened guest.
-	 */
-	if (!ms_hyperv.paravisor_present && hv_isolation_type_snp()) {
-		apic->wakeup_secondary_cpu_64 = hv_snp_boot_ap;
-		return;
-	}
-
-#ifdef CONFIG_X86_64
-	/* If AP LPs exist, we are in a kexec'd kernel and VPs already exist */
-	if (num_present_cpus() == 1 || hv_lp_exists(1))
-		return;
-
-	for_each_present_cpu(i) {
-		if (i == 0)
-			continue;
-		ret = hv_call_add_logical_proc(numa_cpu_node(i), i, cpu_physical_id(i));
-		BUG_ON(ret);
-	}
-
-	ret = hv_call_notify_all_processors_started();
-	WARN_ON(ret);
-
-	for_each_present_cpu(i) {
-		if (i == 0)
-			continue;
-		ret = hv_call_create_vp(numa_cpu_node(i), hv_current_partition_id, i, i);
-		BUG_ON(ret);
-	}
-#endif
+static void __init hv_smp_prepare_cpus_for_root(unsigned int max_cpus)
+{
+	native_smp_prepare_cpus(max_cpus);
+	hv_smp_prep_cpus();
 }
 #endif
 
@@ -722,9 +691,10 @@ static void __init ms_hyperv_init_platform(void)
 
 # ifdef CONFIG_SMP
 	smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
-	if (hv_root_partition() ||
-	    (!ms_hyperv.paravisor_present && hv_isolation_type_snp()))
-		smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
+	if (!ms_hyperv.paravisor_present && hv_isolation_type_snp())
+		smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus_for_snp;
+	else if (hv_root_partition())
+		smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus_for_root;
 # endif
 
 	/*
diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
index 57b2c64197cb..b8aa76a7b19b 100644
--- a/drivers/hv/hv_proc.c
+++ b/drivers/hv/hv_proc.c
@@ -8,6 +8,7 @@
 #include <linux/minmax.h>
 #include <linux/export.h>
 #include <asm/mshyperv.h>
+#include <asm/numa.h>
 
 /*
  * See struct hv_deposit_memory. The first u64 is partition ID, the rest
@@ -154,7 +155,7 @@ bool hv_result_needs_memory(u64 status)
 }
 EXPORT_SYMBOL_GPL(hv_result_needs_memory);
 
-int hv_call_add_logical_proc(int node, u32 lp_index, u32 apic_id)
+static int hv_call_add_logical_proc(int node, u32 lp_index, u32 apic_id)
 {
 	struct hv_input_add_logical_processor *input;
 	struct hv_output_add_logical_processor *output;
@@ -240,7 +241,7 @@ int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags)
 }
 EXPORT_SYMBOL_GPL(hv_call_create_vp);
 
-int hv_call_notify_all_processors_started(void)
+static int hv_call_notify_all_processors_started(void)
 {
 	struct hv_input_notify_partition_event *input;
 	u64 status;
@@ -262,7 +263,7 @@ int hv_call_notify_all_processors_started(void)
 	return ret;
 }
 
-bool hv_lp_exists(u32 lp_index)
+static bool hv_lp_exists(u32 lp_index)
 {
 	struct hv_input_get_logical_processor_run_time *input;
 	struct hv_output_get_logical_processor_run_time *output;
@@ -286,3 +287,31 @@ bool hv_lp_exists(u32 lp_index)
 
 	return hv_result_success(status);
 }
+
+void hv_smp_prep_cpus(void)
+{
+#ifdef CONFIG_X86_64
+	int i, ret;
+
+	/* If AP LPs exist, we are in a kexec'd kernel and VPs already exist */
+	if (num_present_cpus() == 1 || hv_lp_exists(1))
+		return;
+
+	for_each_present_cpu(i) {
+		if (i == 0)
+			continue;
+		ret = hv_call_add_logical_proc(numa_cpu_node(i), i, cpu_physical_id(i));
+		BUG_ON(ret);
+	}
+
+	ret = hv_call_notify_all_processors_started();
+	WARN_ON(ret);
+
+	for_each_present_cpu(i) {
+		if (i == 0)
+			continue;
+		ret = hv_call_create_vp(numa_cpu_node(i), hv_current_partition_id, i, i);
+		BUG_ON(ret);
+	}
+#endif
+}
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index bf601d67cecb..ea1c4acda1ec 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -346,9 +346,7 @@ static inline bool hv_parent_partition(void)
 bool hv_result_needs_memory(u64 status);
 int hv_deposit_memory_node(int node, u64 partition_id, u64 status);
 int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
-int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
-int hv_call_notify_all_processors_started(void);
-bool hv_lp_exists(u32 lp_index);
+void hv_smp_prep_cpus(void);
 int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
 
 #else /* CONFIG_MSHV_ROOT */
@@ -364,18 +362,7 @@ static inline int hv_call_deposit_pages(int node, u64 partition_id, u32 num_page
 {
 	return -EOPNOTSUPP;
 }
-static inline int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id)
-{
-	return -EOPNOTSUPP;
-}
-static inline int hv_call_notify_all_processors_started(void)
-{
-	return -EOPNOTSUPP;
-}
-static inline bool hv_lp_exists(u32 lp_index)
-{
-	return false;
-}
+static inline void hv_smp_prep_cpus(void) {}
 static inline int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags)
 {
 	return -EOPNOTSUPP;
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH 1/1] mshv: Add conditional VMBus dependency
From: Arnd Bergmann @ 2026-05-21 20:15 UTC (permalink / raw)
  To: Michael Kelley, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, longli, Jork Loeser, linux-hyperv
  Cc: linux-kernel, hamzamahfooz
In-Reply-To: <20260521164921.1995-1-mhklkml@zohomail.com>

On Thu, May 21, 2026, at 18:49, Michael Kelley wrote:
>
> Existing code ensures that the VMBus driver loads first if it is
> built-in. The VMBus driver uses subsys_initcall(), which is
> initcall level 4. The MSHV root driver uses module_init(), which
> becomes device_init() when built-in, and device_init() is
> initcall level 6.
>
> Reported-by: Arnd Bergmann <arnd@arndb.de>
> Closes: https://lore.kernel.org/all/20260520074044.923728-1-arnd@kernel.org/
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>

Looks good to me, thanks for fixing it!

Acked-by: Arnd Bergmann <arnd@arndb.de>

>  	/*
>  	 * VMBus owns SIMP/SIEFP/SCONTROL when it is active.
>  	 * See hv_hyp_synic_enable_regs() for that initialization.
>  	 */
> -	bool vmbus_active = hv_vmbus_exists();
> +#if IS_ENABLED(CONFIG_HYPERV_VMBUS)
> +	vmbus_active = hv_vmbus_exists();
> +#endif

I would usually write this as 

        if (IS_ENABLED(CONFIG_HYPERV_VMBUS))
                  vmbus_active = hv_vmbus_exists();

for readability, since the hv_vmbus_exists() declarations is still
visible and the IS_ENABLED() check avoids the link failure.

      ARnd

^ permalink raw reply

* Re: [PATCH v3 36/41] x86/kvmclock: Get local APIC bus frequency from PV CPUID Timing Info
From: Sean Christopherson @ 2026-05-21 20:34 UTC (permalink / raw)
  To: David Woodhouse
  Cc: tglx@kernel.org, longli@microsoft.com, luto@kernel.org,
	alexey.makhalov@broadcom.com, jstultz@google.com,
	dave.hansen@linux.intel.com, ajay.kaher@broadcom.com,
	jan.kiszka@siemens.com, haiyangz@microsoft.com, kas@kernel.org,
	pbonzini@redhat.com, kys@microsoft.com, decui@microsoft.com,
	daniel.lezcano@kernel.org, wei.liu@kernel.org,
	peterz@infradead.org, jgross@suse.com, boris.ostrovsky@oracle.com,
	linux-coco@lists.linux.dev, kvm@vger.kernel.org,
	mhklinux@outlook.com, thomas.lendacky@amd.com,
	linux-kernel@vger.kernel.org,
	bcm-kernel-feedback-list@broadcom.com, tglx@linutronix.de,
	nikunj@amd.com, xen-devel@lists.xenproject.org,
	linux-hyperv@vger.kernel.org, vkuznets@redhat.com,
	rick.p.edgecombe@intel.com, virtualization@lists.linux.dev,
	sboyd@kernel.org, x86@kernel.org
In-Reply-To: <7489ff3cc1ff402bf0ade38272fc52dcbcc75fc1.camel@amazon.co.uk>

On Wed, May 20, 2026, David Woodhouse wrote:
> On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> > When running as a KVM guest with kvmclock support enabled, stuff the APIC
> > timer period/frequency with the local APIC bus frequency reported in
> > CPUID.0x40000010.EBX instead of trying to calibrate/guess the frequency.
> > 
> > See Documentation/virt/kvm/x86/cpuid.rst for details.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> I still don't much like the way this is done inside kvm_get_tsc_khz().

Yeah, I don't like it either (understatement).  Aha!  native_calibrate_tsc() is
the oddball, all of the PV flows stuff lapic_timer_period when parsing the initial
timing info.  I'll just do that.  Blindly writing a global is all kinds of fugly,
but that's a future
problem to solve.

> We also probably ought to be looking for the timing leaf on other
> hypervisors including VMware 

VMware gets the frequency via hypercall.  Why, I have no idea.  I'll let the
VMware folks deal with that.

	eax = vmware_hypercall3(VMWARE_CMD_GETHZ, UINT_MAX, &ebx, &ecx);

> and probably Bhyve too.  Should it be done somewhere else?

I'm not opposed to that, though I don't know that it'd be a net positive. The
"hard" part of getting the info is finding the CPUID base and checking if the
leaf is available.  Unlike the native CPUID leaf, no math is necessary, and so
once the leaf is obtained, getting the frequency is trivial.

Regardless, I definitely don't want to take it on in this series. :-)


^ permalink raw reply

* Re: [PATCH v3 29/41] x86/paravirt: Plumb a return code into __paravirt_set_sched_clock()
From: Sean Christopherson @ 2026-05-21 20:35 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Kiryl Shutsemau, Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov,
	Jan Kiszka, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, Thomas Gleixner, John Stultz,
	Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <13d79ba1e0450068c9573ccd8deb3ec007aea8d6.camel@infradead.org>

On Thu, May 21, 2026, David Woodhouse wrote:
> On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> > Add a return code to __paravirt_set_sched_clock() so that the kernel can
> > reject attempts to use a PV sched_clock without breaking the caller.  E.g.
> > when running as a CoCo VM with a secure TSC, using a PV clock is generally
> > undesirable.
> > 
> > Note, kvmclock is the only PV clock that does anything "extra" beyond
> > simply registering itself as sched_clock, i.e. is the only caller that
> > needs to check the new return value.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> Oooh... can we use this to reject the kvmclock when we have a stable
> and reliable TSC even for non-CoCo guests?

Yes, but I would much rather "fix" kvmclock to not even attempt to register itself
as the sched_clock (which this series does).

^ permalink raw reply

* Re: [PATCH v3 02/41] x86/tsc: Add helper to register CPU and TSC freq calibration routines
From: Sean Christopherson @ 2026-05-21 20:53 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Kiryl Shutsemau, Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov,
	Jan Kiszka, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, Thomas Gleixner, John Stultz,
	Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <44e0d60548d317fd59895f18bd17220dfb2f834b.camel@infradead.org>

On Wed, May 20, 2026, David Woodhouse wrote:
> On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> > Add a helper to register non-native, i.e. PV and CoCo, CPU and TSC
> > frequency calibration routines.  This will allow consolidating handling
> > of common TSC properties that are forced by hypervisor (PV routines),
> > and will also allow adding sanity checks to guard against overriding a
> > TSC calibration routine with a routine that is less robust/trusted.
> > 
> > Make the CPU calibration routine optional, as Xen (very sanely) doesn't
> > assume the CPU runs as the same frequency as the TSC.
> > 
> > Wrap the helper in an #ifdef to document that the kernel overrides
> > the native routines when running as a VM, and to guard against unwanted
> > usage.  Add a TODO to call out that AMD_MEM_ENCRYPT is a mess and doesn't
> > depend on HYPERVISOR_GUEST because it gates both guest and host code.
> > 
> > No functional change intended.
> > 
> > Reviewed-by: Michael Kelley <mhklinux@outlook.com>
> > Tested-by: Michael Kelley <mhklinux@outlook.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> Mildly concerned that we might want to support multiple options — does
> it have CPUID 0x15? Does it have 0x40000x10? Does it have a pvclock?
> There are various permutations of those which are perhaps best handled
> by *trying* each one, in some order, and populating a struct with the
> answers?
> 
> But on the basis that perfect is the enemy of good,

This has been bothering me too.

Aha!  AHA!  Idea.

... 4 hours later ...

Mhahahaahah, victory is mine!!!!

TL;DR: Overriding x86_platform_ops hooks is dumb.

To your point about making an informed decision, that's essentialy what this series
is already doing, just in a very roundabout way:

  1. x86_platform.calibrate_{cpu,tsc}() are initialized to "native" versions
  2. Hypervisor init code runs and conditionally overrides calibrate_{cpu,tsc}()
  3. CoCo init code runs and conditionally overrides calibrate_{cpu,tsc}()

So the ordering you want is already there, as is "trying" each source to some
extent, in the form of steps #2 and #3 overriding the hooks if and only if their
source of information is valid.  For all intents and purposes, the hardening I
was adding by formalizing the calibration overrides was to enforce the above ordering.

But that's obviously all but impossible to follow, _and_ it's pointless.

For every PV case, including TDX and SNP, "calibration" is simply information
retrieval, i.e. it never changes (barring broken hypervisors/firmware), and the
information is always available during early boot.

Contrast that with the pre-CPUID CPU frequency calibration, where the frequency
might change, the kernel is making a best guest based on other timekeeping sources,
and not all timekeeping sources are available during early boot.

And so overriding x86_platform.calibrate_{cpu,tsc}() for PV code is completely
unecessary, because steps #2 and #3 already know the frequency when they override
the hooks, and "success" is guaranteed, i.e. the kernel won't have to switch to a
"late" calibration flow.

If we provide x86_hyper_init hooks:

	unsigned int (*get_tsc_khz)(void);
	unsigned int (*get_cpu_khz)(void);

then we can kill off x86_platform.calibrate_{cpu,tsc}() entirely, explicitly
define the preferred ordering (user-forced => CoCo => Hypervisor => native), and
depup some of the hypervisor code.

E.g. this is what I've got for the early flow.  Testing now. 

  void __init tsc_early_init(void)
  {
	unsigned int known_cpu_khz = 0, known_tsc_khz = 0;

	if (!boot_cpu_has(X86_FEATURE_TSC))
		return;
	/* Don't change UV TSC multi-chassis synchronization */
	if (is_early_uv_system())
		return;

	if (x86_init.hyper.get_cpu_khz)
		known_cpu_khz = x86_init.hyper.get_cpu_khz();

	if (tsc_early_khz)
		known_tsc_khz = tsc_early_khz;
	else if (cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC))
		known_tsc_khz = snp_secure_tsc_init();
	else if (boot_cpu_has(X86_FEATURE_TDX_GUEST))
		known_tsc_khz = tdx_tsc_init();

	/*
	 * If the TSC frequency is still unknown, i.e. not provided by the user
	 * or by trusted firmware, try to get it from the hypervisor (which is
	 * untrusted when running as a CoCo guest).
	 */
	if (!known_tsc_khz && x86_init.hyper.get_tsc_khz)
		known_tsc_khz = x86_init.hyper.get_tsc_khz();

	if (known_tsc_khz)
		setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);

	if (!determine_cpu_tsc_frequencies(true, known_cpu_khz, known_tsc_khz))
		return;
	tsc_enable_sched_clock();
  }

^ permalink raw reply

* [PATCH v4 0/2] drm/hyperv: harden host message parsing
From: Berkant Koc @ 2026-05-21 20:41 UTC (permalink / raw)
  To: Saurabh Sengar, Dexuan Cui, Long Li
  Cc: linux-hyperv, dri-devel, linux-kernel, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Michael Kelley, Thomas Zimmermann,
	Maarten Lankhorst, Maxime Ripard, Deepak Rawat

Two independent issues in the synthetic video driver that both stem
from trusting unvalidated host data.

1/2 bounds resolution_count from SYNTHVID_RESOLUTION_RESPONSE against
the supported_resolution[] array, and populates WIN8 defaults for
hv->screen_*_max / hv->preferred_* in both the WIN10-probe-failure
path and the pre-WIN10 path, so a failed or pre-WIN10 probe yields a
usable display instead of having drm_internal_framebuffer_create()
reject every userspace framebuffer with -EINVAL.

2/2 forwards bytes_recvd from vmbus_recvpacket() into the sub-handler,
rejects packets that do not cover the synthvid header, and requires
the type-specific payload size before memcpy/complete or before
reading the feature-change byte. Rejected packets are logged via
drm_err_ratelimited() instead of being silently dropped, matching the
CoCo-hardened pattern in hv_kvp_onchannelcallback().

1/2 is unchanged from v3 and carries Michael Kelley's Reviewed-by.

Changes since v3 (per review by Michael Kelley):

  2/2: validate SYNTHVID_RESOLUTION_RESPONSE against its actual
       variable length. The response carries resolution_count entries,
       not the full SYNTHVID_MAX_RESOLUTION_COUNT array, so requiring
       sizeof(struct synthvid_supported_resolution_resp) rejected the
       shorter responses the host legitimately sends and broke
       resolution probing. Require the fixed prefix, read and bound
       resolution_count, then require only the count-sized array.

  2/2: only run hyperv_receive_sub() when vmbus_recvpacket() returned
       success. v3 dropped the bytes_recvd upper bound as redundant,
       which holds only on a successful receive: on -ENOBUFS
       vmbus_recvpacket() reports the required length in bytes_recvd,
       which can exceed the 16 KiB hv->recv_buf, and the subsequent
       memcpy(hv->init_buf, msg, bytes_recvd) would read and write
       past both buffers. Gating on the success return restores the
       invariant that made the bound redundant, so an oversized host
       packet is dropped rather than copied.

Changes since v2 (per review by Michael Kelley):

  1/2: dropped the reinit_completion() change; the stale completion can
       only outlive its request in hyperv_vmbus_resume() after a
       get_supported_resolution() timeout, which is a narrower fix that
       belongs in a separate patch against the resume path. Pre-WIN10
       branch now also populates hv->preferred_*. The else branch is
       gone; a single screen_width_max == 0 check covers both the
       pre-WIN10 case and a failed WIN10 probe.

  2/2: added a per-type switch for the three completion-driving message
       types so the wait-completion path validates payload size before
       memcpy/complete. Every reject path emits drm_err_ratelimited()
       rather than returning silently.

Changes since v1:

  1/2: bound resolution_count check folded into the existing zero check;
       populate WIN8 defaults when hyperv_get_supported_resolution()
       fails.
  2/2: forward bytes_recvd into hyperv_receive_sub(); enforce the pipe +
       synthvid header minimum; check synthvid_feature_change payload
       size before reading is_dirt_needed.

The shared init_buf reuse (a duplicate or late host response can
overwrite init_buf between successive request/response cycles) and the
related completion reinit are real but orthogonal to this size
validation. As discussed on v2, they are queued as a separate follow-up
against the resume/expected-type path once this series lands.

This series is verified by static analysis and code inspection against
the synthvid protocol structures and the vmbus_recvpacket() contract. I
do not currently have a Hyper-V test environment to exercise the receive
and resolution-probe paths at runtime, so confirmation from someone who
can run it in a Hyper-V VM would be welcome.

Both patches carry an Assisted-by: Claude:claude-opus-4-7 berkoc-pipeline
trailer per the kernel coding-assistants policy. Code, analysis and
review responses are mine; the model is used as a structured reviewer
under human verification.

Berkant Koc (2):
  drm/hyperv: validate resolution_count and fix WIN8 fallback
  drm/hyperv: validate VMBus packet size in receive callback

 drivers/gpu/drm/hyperv/hyperv_drm_proto.c | 76 ++++++++++++++++++++---
 1 file changed, 69 insertions(+), 7 deletions(-)

base-commit: 4bf5d3da79c48e1df4bab82c9680c53adeff7820
-- 
2.47.3

^ permalink raw reply

* [PATCH v4 1/2] drm/hyperv: validate resolution_count and fix WIN8 fallback
From: Berkant Koc @ 2026-05-19 20:08 UTC (permalink / raw)
  To: Saurabh Sengar, Dexuan Cui, Long Li
  Cc: linux-hyperv, dri-devel, linux-kernel, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Michael Kelley, Thomas Zimmermann,
	Maarten Lankhorst, Maxime Ripard, Deepak Rawat
In-Reply-To: <cover.1779396074.git.me@berkoc.com>

A SYNTHVID_RESOLUTION_RESPONSE with resolution_count > 64 walks past
the supported_resolution[SYNTHVID_MAX_RESOLUTION_COUNT] array in the
parse loop. Bound resolution_count against the array size, folded
into the existing zero-check.

When the WIN10 resolution probe fails, the caller in
hyperv_connect_vsp() left hv->screen_*_max / preferred_* unpopulated,
which sets mode_config.max_width / max_height to 0 and makes
drm_internal_framebuffer_create() reject every userspace framebuffer
with -EINVAL. The pre-WIN10 branch had the same gap for
preferred_width / preferred_height. Use a single post-probe fallback
guarded by screen_width_max == 0 so both paths converge on the WIN8
defaults.

Signed-off-by: Berkant Koc <me@berkoc.com>
Assisted-by: Claude:claude-opus-4-7 berkoc-pipeline
Fixes: 76c56a5affeb ("drm/hyperv: Add DRM driver for hyperv synthetic video device")
Cc: stable@vger.kernel.org # 5.14+
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
---
 drivers/gpu/drm/hyperv/hyperv_drm_proto.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
index 051ecc526..c3d0ff229 100644
--- a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
+++ b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
@@ -391,8 +391,11 @@ static int hyperv_get_supported_resolution(struct hv_device *hdev)
 		return -ETIMEDOUT;
 	}
 
-	if (msg->resolution_resp.resolution_count == 0) {
-		drm_err(dev, "No supported resolutions\n");
+	if (msg->resolution_resp.resolution_count == 0 ||
+	    msg->resolution_resp.resolution_count >
+	    SYNTHVID_MAX_RESOLUTION_COUNT) {
+		drm_err(dev, "Invalid resolution count: %d\n",
+			msg->resolution_resp.resolution_count);
 		return -ENODEV;
 	}
 
@@ -508,9 +511,13 @@ int hyperv_connect_vsp(struct hv_device *hdev)
 		ret = hyperv_get_supported_resolution(hdev);
 		if (ret)
 			drm_err(dev, "Failed to get supported resolution from host, use default\n");
-	} else {
+	}
+
+	if (!hv->screen_width_max) {
 		hv->screen_width_max = SYNTHVID_WIDTH_WIN8;
 		hv->screen_height_max = SYNTHVID_HEIGHT_WIN8;
+		hv->preferred_width = SYNTHVID_WIDTH_WIN8;
+		hv->preferred_height = SYNTHVID_HEIGHT_WIN8;
 	}
 
 	hv->mmio_megabytes = hdev->channel->offermsg.offer.mmio_megabytes;
-- 
2.47.3


^ permalink raw reply related

* [PATCH v4 2/2] drm/hyperv: validate VMBus packet size in receive callback
From: Berkant Koc @ 2026-05-19 20:08 UTC (permalink / raw)
  To: Saurabh Sengar, Dexuan Cui, Long Li
  Cc: linux-hyperv, dri-devel, linux-kernel, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Michael Kelley, Thomas Zimmermann,
	Maarten Lankhorst, Maxime Ripard, Deepak Rawat
In-Reply-To: <cover.1779396074.git.me@berkoc.com>

hyperv_receive_sub() reads msg->vid_hdr.type and dispatches into one
of four message-type branches without knowing how many bytes the host
wrote into hv->recv_buf. The completion path then runs
memcpy(hv->init_buf, msg, VMBUS_MAX_PACKET_SIZE), so the consumer that
wakes on wait_for_completion_timeout() can read up to 16 KiB of
residue from a prior message as if it were the response payload.

Pass bytes_recvd into hyperv_receive_sub() and reject any packet that
does not cover the pipe + synthvid header. For the three
completion-driving types (SYNTHVID_VERSION_RESPONSE,
SYNTHVID_RESOLUTION_RESPONSE, SYNTHVID_VRAM_LOCATION_ACK) require the
type-specific payload before memcpy/complete, and apply the same rule
to SYNTHVID_FEATURE_CHANGE before reading is_dirt_needed.

SYNTHVID_RESOLUTION_RESPONSE is variable length: the host fills
resolution_count entries, not the full SYNTHVID_MAX_RESOLUTION_COUNT
array. Validate the fixed prefix first so resolution_count can be
read, bound it against the array, then require only the count-sized
array, so the shorter responses the host actually sends are accepted.

Only run the sub-handler when vmbus_recvpacket() returned success. The
memcpy length is bytes_recvd, which is bounded by VMBUS_MAX_PACKET_SIZE
only on a successful receive; on -ENOBUFS vmbus_recvpacket() instead
reports the required length, which can exceed hv->recv_buf, so copying
bytes_recvd would read and write past the 16 KiB buffers. Gating on the
success return keeps the copy bounded.

Rejected packets are reported via drm_err_ratelimited() rather than
silently dropped, matching the CoCo-hardened pattern in
hv_kvp_onchannelcallback().

Fixes: 76c56a5affeb ("drm/hyperv: Add DRM driver for hyperv synthetic video device")
Cc: stable@vger.kernel.org # 5.14+
Signed-off-by: Berkant Koc <me@berkoc.com>
Assisted-by: Claude:claude-opus-4-7 berkoc-pipeline
---
 drivers/gpu/drm/hyperv/hyperv_drm_proto.c | 63 +++++++++++++++++++++--
 1 file changed, 59 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
index c3d0ff229..48054b607 100644
--- a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
+++ b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
@@ -420,26 +420,81 @@ static int hyperv_get_supported_resolution(struct hv_device *hdev)
 	return 0;
 }
 
-static void hyperv_receive_sub(struct hv_device *hdev)
+static void hyperv_receive_sub(struct hv_device *hdev, u32 bytes_recvd)
 {
 	struct hyperv_drm_device *hv = hv_get_drvdata(hdev);
 	struct synthvid_msg *msg;
+	size_t hdr_size;
 
 	if (!hv)
 		return;
 
+	hdr_size = sizeof(struct pipe_msg_hdr) +
+		   sizeof(struct synthvid_msg_hdr);
+	if (bytes_recvd < hdr_size) {
+		drm_err_ratelimited(&hv->dev,
+				    "synthvid packet too small for header: %u\n",
+				    bytes_recvd);
+		return;
+	}
+
 	msg = (struct synthvid_msg *)hv->recv_buf;
 
 	/* Complete the wait event */
 	if (msg->vid_hdr.type == SYNTHVID_VERSION_RESPONSE ||
 	    msg->vid_hdr.type == SYNTHVID_RESOLUTION_RESPONSE ||
 	    msg->vid_hdr.type == SYNTHVID_VRAM_LOCATION_ACK) {
-		memcpy(hv->init_buf, msg, VMBUS_MAX_PACKET_SIZE);
+		size_t need = hdr_size;
+
+		switch (msg->vid_hdr.type) {
+		case SYNTHVID_VERSION_RESPONSE:
+			need += sizeof(struct synthvid_version_resp);
+			break;
+		case SYNTHVID_RESOLUTION_RESPONSE:
+			/*
+			 * The resolution response is variable length: the host
+			 * fills resolution_count entries, not the full
+			 * SYNTHVID_MAX_RESOLUTION_COUNT array. Require the fixed
+			 * prefix first so resolution_count can be read, then
+			 * demand exactly the count-sized array.
+			 */
+			need += offsetof(struct synthvid_supported_resolution_resp,
+					 supported_resolution);
+			if (bytes_recvd < need)
+				break;
+			if (msg->resolution_resp.resolution_count >
+			    SYNTHVID_MAX_RESOLUTION_COUNT) {
+				drm_err_ratelimited(&hv->dev,
+						    "synthvid resolution count too large: %u\n",
+						    msg->resolution_resp.resolution_count);
+				return;
+			}
+			need += msg->resolution_resp.resolution_count *
+				sizeof(struct hvd_screen_info);
+			break;
+		case SYNTHVID_VRAM_LOCATION_ACK:
+			need += sizeof(struct synthvid_vram_location_ack);
+			break;
+		}
+		if (bytes_recvd < need) {
+			drm_err_ratelimited(&hv->dev,
+					    "synthvid packet too small for type %u: %u < %zu\n",
+					    msg->vid_hdr.type, bytes_recvd, need);
+			return;
+		}
+		memcpy(hv->init_buf, msg, bytes_recvd);
 		complete(&hv->wait);
 		return;
 	}
 
 	if (msg->vid_hdr.type == SYNTHVID_FEATURE_CHANGE) {
+		if (bytes_recvd < hdr_size +
+		    sizeof(struct synthvid_feature_change)) {
+			drm_err_ratelimited(&hv->dev,
+					    "synthvid feature change packet too small: %u\n",
+					    bytes_recvd);
+			return;
+		}
 		hv->dirt_needed = msg->feature_chg.is_dirt_needed;
 		if (hv->dirt_needed)
 			hyperv_hide_hw_ptr(hv->hdev);
@@ -464,9 +519,9 @@ static void hyperv_receive(void *ctx)
 		ret = vmbus_recvpacket(hdev->channel, recv_buf,
 				       VMBUS_MAX_PACKET_SIZE,
 				       &bytes_recvd, &req_id);
-		if (bytes_recvd > 0 &&
+		if (!ret && bytes_recvd > 0 &&
 		    recv_buf->pipe_hdr.type == PIPE_MSG_DATA)
-			hyperv_receive_sub(hdev);
+			hyperv_receive_sub(hdev, bytes_recvd);
 	} while (bytes_recvd > 0 && ret == 0);
 }
 
-- 
2.47.3


^ permalink raw reply related

* Re: [PATCH v3 37/41] x86/kvmclock: Use TSC for sched_clock if it's constant and non-stop
From: Sean Christopherson @ 2026-05-21 21:01 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: kvm, Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner, David Woodhouse
In-Reply-To: <c54fd01b-fe22-4c9c-8d5f-5b317de07a40@oracle.com>

On Thu, May 21, 2026, Dongli Zhang wrote:
> On 2026-05-15 12:19 PM, Sean Christopherson wrote:
> > Prefer the TSC over kvmclock for sched_clock if the TSC is constant,
> > nonstop, and not marked unstable via command line.  I.e. use the same
> > criteria as tweaking the clocksource rating so that TSC is preferred over
> > kvmclock.  Per the below comment from native_sched_clock(), sched_clock
> > is more tolerant of slop than clocksource; using TSC for clocksource but
> > not sched_clock makes little to no sense, especially now that KVM CoCo
> > guests with a trusted TSC use TSC, not kvmclock.
> > 
> >         /*
> >          * Fall back to jiffies if there's no TSC available:
> >          * ( But note that we still use it if the TSC is marked
> >          *   unstable. We do this because unlike Time Of Day,
> >          *   the scheduler clock tolerates small errors and it's
> >          *   very important for it to be as fast as the platform
> >          *   can achieve it. )
> >          */
> > 
> > The only advantage of using kvmclock is that doing so allows for early
> > and common detection of PVCLOCK_GUEST_STOPPED, but that code has been
> > broken for over two years with nary a complaint, i.e. it can't be
> > _that_ valuable.  And as above, certain types of KVM guests are losing
> > the functionality regardless, i.e. acknowledging PVCLOCK_GUEST_STOPPED
> > needs to be decoupled from sched_clock() no matter what.
> 
> Has it been broken for two years because of pvclock_clocksource_read_nowd()?

Yep.  Because pvclock_clocksource_read_nowd() ignores PVCLOCK_GUEST_STOPPED, the
flag only ever gets recognized when the kernel reads WALL_CLOCK, which AFAICT
only happens at initial boot, and during suspend and resume.

^ permalink raw reply

* Re: [PATCH v3 02/41] x86/tsc: Add helper to register CPU and TSC freq calibration routines
From: David Woodhouse @ 2026-05-21 21:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kiryl Shutsemau, Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov,
	Jan Kiszka, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, Thomas Gleixner, John Stultz,
	Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <ag9wz3RiJOtVZrK0@google.com>

[-- Attachment #1: Type: text/plain, Size: 1407 bytes --]

On Thu, 2026-05-21 at 13:53 -0700, Sean Christopherson wrote:
> 
> E.g. this is what I've got for the early flow.  Testing now. 
> 
>   void __init tsc_early_init(void)
>   {
> 	unsigned int known_cpu_khz = 0, known_tsc_khz = 0;
> 
> 	if (!boot_cpu_has(X86_FEATURE_TSC))
> 		return;
> 	/* Don't change UV TSC multi-chassis synchronization */
> 	if (is_early_uv_system())
> 		return;
> 
> 	if (x86_init.hyper.get_cpu_khz)
> 		known_cpu_khz = x86_init.hyper.get_cpu_khz();
> 
> 	if (tsc_early_khz)
> 		known_tsc_khz = tsc_early_khz;
> 	else if (cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC))
> 		known_tsc_khz = snp_secure_tsc_init();
> 	else if (boot_cpu_has(X86_FEATURE_TDX_GUEST))
> 		known_tsc_khz = tdx_tsc_init();
> 
> 	/*
> 	 * If the TSC frequency is still unknown, i.e. not provided by the user
> 	 * or by trusted firmware, try to get it from the hypervisor (which is
> 	 * untrusted when running as a CoCo guest).
> 	 */
> 	if (!known_tsc_khz && x86_init.hyper.get_tsc_khz)
> 		known_tsc_khz = x86_init.hyper.get_tsc_khz();
> 
> 	if (known_tsc_khz)
> 		setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
> 
> 	if (!determine_cpu_tsc_frequencies(true, known_cpu_khz, known_tsc_khz))
> 		return;
> 	tsc_enable_sched_clock();
>   }

That seems reasonable. Where does the call to native_calibrate_tsc()
happen; is that from determine_cpu_tsc_frequencies()? 

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v3 02/41] x86/tsc: Add helper to register CPU and TSC freq calibration routines
From: Sean Christopherson @ 2026-05-21 21:17 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Kiryl Shutsemau, Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov,
	Jan Kiszka, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, Thomas Gleixner, John Stultz,
	Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <342098f6bfe1e4c7b233433df8f79713b4220614.camel@infradead.org>

On Thu, May 21, 2026, David Woodhouse wrote:
> On Thu, 2026-05-21 at 13:53 -0700, Sean Christopherson wrote:
> > 
> > E.g. this is what I've got for the early flow.  Testing now. 
> > 
> >   void __init tsc_early_init(void)
> >   {
> > 	unsigned int known_cpu_khz = 0, known_tsc_khz = 0;
> > 
> > 	if (!boot_cpu_has(X86_FEATURE_TSC))
> > 		return;
> > 	/* Don't change UV TSC multi-chassis synchronization */
> > 	if (is_early_uv_system())
> > 		return;
> > 
> > 	if (x86_init.hyper.get_cpu_khz)
> > 		known_cpu_khz = x86_init.hyper.get_cpu_khz();
> > 
> > 	if (tsc_early_khz)
> > 		known_tsc_khz = tsc_early_khz;
> > 	else if (cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC))
> > 		known_tsc_khz = snp_secure_tsc_init();
> > 	else if (boot_cpu_has(X86_FEATURE_TDX_GUEST))
> > 		known_tsc_khz = tdx_tsc_init();
> > 
> > 	/*
> > 	 * If the TSC frequency is still unknown, i.e. not provided by the user
> > 	 * or by trusted firmware, try to get it from the hypervisor (which is
> > 	 * untrusted when running as a CoCo guest).
> > 	 */
> > 	if (!known_tsc_khz && x86_init.hyper.get_tsc_khz)
> > 		known_tsc_khz = x86_init.hyper.get_tsc_khz();
> > 
> > 	if (known_tsc_khz)
> > 		setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
> > 
> > 	if (!determine_cpu_tsc_frequencies(true, known_cpu_khz, known_tsc_khz))
> > 		return;
> > 	tsc_enable_sched_clock();
> >   }
> 
> That seems reasonable. Where does the call to native_calibrate_tsc()
> happen; is that from determine_cpu_tsc_frequencies()? 

Yep.

static bool __init determine_cpu_tsc_frequencies(bool early,
						 unsigned int known_cpu_khz,
						 unsigned int known_tsc_khz)
{
	/* Make sure that cpu and tsc are not already calibrated */
	WARN_ON(cpu_khz || tsc_khz);

	if (early) {
		/*
		 * Early CPU calibration can only use methods that are available
		 * early in boot (obviously).
		 */
		if (known_cpu_khz)
			cpu_khz = known_cpu_khz;
		else
			cpu_khz = native_calibrate_cpu_early();
		if (known_tsc_khz)
			tsc_khz = known_tsc_khz;
		else
			tsc_khz = native_calibrate_tsc();
	} else {
		cpu_khz = pit_hpet_ptimer_calibrate_cpu();
	}

	...

^ permalink raw reply

* RE: [PATCH 1/1] mshv: Add conditional VMBus dependency
From: Michael Kelley @ 2026-05-21 21:17 UTC (permalink / raw)
  To: Arnd Bergmann, Michael Kelley, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, longli@microsoft.com, Jork Loeser,
	linux-hyperv@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, hamzamahfooz@linux.microsoft.com
In-Reply-To: <b3c6144a-beb1-44ff-9a7d-bad61a1b3829@app.fastmail.com>

From: Arnd Bergmann <arnd@arndb.de> Sent: Thursday, May 21, 2026 1:16 PM
> 
> On Thu, May 21, 2026, at 18:49, Michael Kelley wrote:
> >
> > Existing code ensures that the VMBus driver loads first if it is
> > built-in. The VMBus driver uses subsys_initcall(), which is
> > initcall level 4. The MSHV root driver uses module_init(), which
> > becomes device_init() when built-in, and device_init() is
> > initcall level 6.
> >
> > Reported-by: Arnd Bergmann <arnd@arndb.de>
> > Closes: https://lore.kernel.org/all/20260520074044.923728-1-arnd@kernel.org/
> > Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> 
> Looks good to me, thanks for fixing it!
> 
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> 
> >  	/*
> >  	 * VMBus owns SIMP/SIEFP/SCONTROL when it is active.
> >  	 * See hv_hyp_synic_enable_regs() for that initialization.
> >  	 */
> > -	bool vmbus_active = hv_vmbus_exists();
> > +#if IS_ENABLED(CONFIG_HYPERV_VMBUS)
> > +	vmbus_active = hv_vmbus_exists();
> > +#endif
> 
> I would usually write this as
> 
>         if (IS_ENABLED(CONFIG_HYPERV_VMBUS))
>                   vmbus_active = hv_vmbus_exists();
> 
> for readability, since the hv_vmbus_exists() declarations is still
> visible and the IS_ENABLED() check avoids the link failure.
> 

I thought about doing that, but wasn't sure it would work. There
are nuances of #ifdef vs. #if IS_ENABLED() vs. if (IS_ENABLED())
that I haven't learned. :-(

I'll wait a few days to see if any comments come in from Jork
Jork or other MSFT folks, and then spin a v2 with your change
so the cleaner version is what goes upstream.

Thanks!

Michael

^ permalink raw reply

* Re: [PATCH 1/1] mshv: Add conditional VMBus dependency
From: Jork Loeser @ 2026-05-21 21:19 UTC (permalink / raw)
  To: mhklinux
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel,
	arnd, hamzamahfooz
In-Reply-To: <20260521164921.1995-1-mhklkml@zohomail.com>

On Thu, 21 May 2026, Michael Kelley wrote:

> From: Michael Kelley <mhklinux@outlook.com>

> * Add #ifdefs around MSHV SynIC calls to hv_vmbus_exists(). When

Could as well do an empty definition of hv_vmbus_exists() if VMBUS is not 
configured, no?

> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> index 2d0b3fcb0ff8..aa11bcefddf2 100644
> --- a/drivers/hv/Kconfig
> +++ b/drivers/hv/Kconfig
> @@ -74,6 +74,7 @@ config MSHV_ROOT
> 	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> 	# no particular order, making it impossible to reassemble larger pages
> 	depends on PAGE_SIZE_4KB
> +	depends on HYPERV_VMBUS if HYPERV_VMBUS

Nice, thanks!

Reviewed-by: Jork Loeser <jloeser@linux.microsoft.com>

^ permalink raw reply

* Re: [PATCH v4 1/2] drm/hyperv: validate resolution_count and fix WIN8 fallback
From: sashiko-bot @ 2026-05-21 21:26 UTC (permalink / raw)
  To: Berkant Koc; +Cc: linux-hyperv
In-Reply-To: <6945b22419c7d404b4954a113de2ac9c900dba93.1779396074.git.me@berkoc.com>

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] Missing Completion Reinitialization causes state machine desynchronization.
- [High] Missing Message Type Validation allows parsing of incorrect response payloads.
--

commit e5968f0d509475850afc95a1bf9a982ea5002949
Author: Berkant Koc <me@berkoc.com>

drm/hyperv: validate resolution_count and fix WIN8 fallback

This commit bounds resolution_count against the array size to prevent walking
past the supported_resolution array in the parse loop. It also provides a
single post-probe fallback to ensure userspace framebuffers are not rejected
when the WIN10 resolution probe fails.

> diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> index 051ecc5268321..c3d0ff229e3d0 100644
> --- a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> +++ b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> @@ -391,8 +391,11 @@ static int hyperv_get_supported_resolution(struct hv_device *hdev)
>  		return -ETIMEDOUT;
>  	}

[Severity: High]
This is a pre-existing issue, but is it possible for the state machine to
become desynchronized here if a timeout occurs?

The driver uses wait_for_completion_timeout() for synchronous VMBUS requests.
If the host is slow and the wait times out, the hv->wait completion counter is
not decremented since it returns early.

If the host responds after the timeout, hyperv_receive_sub() calls
complete(&hv->wait), incrementing the counter. Because the driver never calls
reinit_completion(&hv->wait) before sending a new request, the next transaction
could consume the late completion from the previous request immediately.

For example, a timeout during hyperv_get_supported_resolution() followed by a
late response could cause a subsequent hyperv_update_vram_location() to
complete immediately before the host has actually acknowledged the VRAM
location, which might lead to memory corruption or VM crashes.

>  
> -	if (msg->resolution_resp.resolution_count == 0) {
> -		drm_err(dev, "No supported resolutions\n");
> +	if (msg->resolution_resp.resolution_count == 0 ||
> +	    msg->resolution_resp.resolution_count >
> +	    SYNTHVID_MAX_RESOLUTION_COUNT) {

[Severity: High]
This isn't a bug introduced by this patch, but should the code verify the
message type before accessing the payload?

The driver uses a single wait completion (hv->wait) and a single response
buffer (hv->init_buf) for multiple VMBUS message types. In
hyperv_receive_sub(), receiving any valid response type (e.g.,
SYNTHVID_VERSION_RESPONSE, SYNTHVID_RESOLUTION_RESPONSE,
SYNTHVID_VRAM_LOCATION_ACK) will trigger complete(&hv->wait).

Functions waiting for responses, like hyperv_get_supported_resolution(), appear
to assume the payload in hv->init_buf is the correct response type without
verifying msg->vid_hdr.type.

If the state machine is desynchronized (e.g., due to a previous timeout) or a
buggy host sends an unsolicited response, could the driver misinterpret the
payload by reading fields like resolution_count from stale memory or data
belonging to a different response type?

> +		drm_err(dev, "Invalid resolution count: %d\n",
> +			msg->resolution_resp.resolution_count);
>  		return -ENODEV;
>  	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1779396074.git.me@berkoc.com?part=1

^ permalink raw reply

* Re: [PATCH v3 02/41] x86/tsc: Add helper to register CPU and TSC freq calibration routines
From: David Woodhouse @ 2026-05-21 21:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kiryl Shutsemau, Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov,
	Jan Kiszka, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, Thomas Gleixner, John Stultz,
	Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <ag92Ze_FADmL1llo@google.com>

[-- Attachment #1: Type: text/plain, Size: 1648 bytes --]

On Thu, 2026-05-21 at 14:17 -0700, Sean Christopherson wrote:
>  
> > That seems reasonable. Where does the call to
> > native_calibrate_tsc()
> > happen; is that from determine_cpu_tsc_frequencies()? 
> 
> Yep.

Great, thanks.

> static bool __init determine_cpu_tsc_frequencies(bool early,
> 						 unsigned int
> known_cpu_khz,
> 						 unsigned int
> known_tsc_khz)
> {
> 	/* Make sure that cpu and tsc are not already calibrated */
> 	WARN_ON(cpu_khz || tsc_khz);
> 
> 	if (early) {
> 		/*
> 		 * Early CPU calibration can only use methods that
> are available
> 		 * early in boot (obviously).
> 		 */
> 		if (known_cpu_khz)
> 			cpu_khz = known_cpu_khz;
> 		else
> 			cpu_khz = native_calibrate_cpu_early();
> 		if (known_tsc_khz)
> 			tsc_khz = known_tsc_khz;
> 		else
> 			tsc_khz = native_calibrate_tsc();
> 	} else {
> 		cpu_khz = pit_hpet_ptimer_calibrate_cpu();
> 	}


If, after all that, we still end up in the case where we *do* have to
calibrate it against a legacy timer (which sadly IIRC is the case even
on some fairly modern AMD generations), could we round the answer?

We currently have *far* more precision than accuracy, leading to values
like 2399997kHz which change every boot (and end up being what gets
*advertised* to guests on such a host... and then unless we're careful
to avoid it, we end up trying to *scale* a different host's TSC down
from 2399998 to 2399997 for a guest which is migrated from the first
host...)

We should just fix them (e.g. to 2400000kHz) and let NTP sort them out.

Something like "round to the nearest MHz if that's within ±10PPM"?



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* RE: [PATCH 1/1] mshv: Add conditional VMBus dependency
From: Michael Kelley @ 2026-05-21 21:41 UTC (permalink / raw)
  To: Jork Loeser
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, longli@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	arnd@arndb.de, hamzamahfooz@linux.microsoft.com
In-Reply-To: <79f77f98-f91-4cf-47ca-c986faed5055@linux.microsoft.com>

From: Jork Loeser <jloeser@linux.microsoft.com> Sent: Thursday, May 21, 2026 2:20 PM
> 
> On Thu, 21 May 2026, Michael Kelley wrote:
> 
> > From: Michael Kelley <mhklinux@outlook.com>
> 
> > * Add #ifdefs around MSHV SynIC calls to hv_vmbus_exists(). When
> 
> Could as well do an empty definition of hv_vmbus_exists() if VMBUS is not
> configured, no?

Yes, indeed. I would have done that if there were more than 2 places
where hv_vmbus_exists() is called. For me, having exactly 2 places was
on the tipping point of testing CONFIG_HYPERV_VMBUS inline vs.
adding the test in a .h file.

Thinking about it more, I'll try the .h file route in a v2. That way
mshv_synic.c doesn't have to be touched at all.

Michael

> 
> > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > index 2d0b3fcb0ff8..aa11bcefddf2 100644
> > --- a/drivers/hv/Kconfig
> > +++ b/drivers/hv/Kconfig
> > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > 	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> > 	# no particular order, making it impossible to reassemble larger pages
> > 	depends on PAGE_SIZE_4KB
> > +	depends on HYPERV_VMBUS if HYPERV_VMBUS
> 
> Nice, thanks!
> 
> Reviewed-by: Jork Loeser <jloeser@linux.microsoft.com>


^ permalink raw reply

* Re: [PATCH v4 2/2] drm/hyperv: validate VMBus packet size in receive callback
From: sashiko-bot @ 2026-05-21 22:06 UTC (permalink / raw)
  To: Berkant Koc; +Cc: linux-hyperv
In-Reply-To: <6e5d1d57a3afc4c5ea0d2a3d62be58c90741a869.1779396074.git.me@berkoc.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

Pre-existing issues:
- [High] The completion `hv->wait` is not re-initialized before sending a new request, leading to premature completions and use of stale data.
- [High] The wait thread unconditionally trusts that the received message in `hv->init_buf` matches the expected response type without validating `vid_hdr.type`.
- [High] If the host sends a packet larger than VMBUS_MAX_PACKET_SIZE, the channel becomes permanently wedged.
- [High] Unprotected concurrent access to `hv->init_buf` causes outbound request corruption.
--

commit 6ae2d5d611e1bfae5ccaf2facc0f88f11136806d
Author: Berkant Koc <me@berkoc.com>
Date:   Tue May 19 20:08:53 2026 +0000

drm/hyperv: validate VMBus packet size in receive callback

This commit validates the size of received VMBus packets to prevent
reading past the end of the receive buffer. It requires specific
payload sizes based on the message type. It also prevents copying
when vmbus_recvpacket() returns an error.

Cc: stable@vger.kernel.org # 5.14+

> diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> index c3d0ff229e3d0..48054b6078f54 100644
> --- a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> +++ b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> @@ -420,26 +420,81 @@ static int hyperv_get_supported_resolution(struct hv_device *hdev)

[ ... ]

>  	/* Complete the wait event */
>  	if (msg->vid_hdr.type == SYNTHVID_VERSION_RESPONSE ||
>  	    msg->vid_hdr.type == SYNTHVID_RESOLUTION_RESPONSE ||
>  	    msg->vid_hdr.type == SYNTHVID_VRAM_LOCATION_ACK) {
> -		memcpy(hv->init_buf, msg, VMBUS_MAX_PACKET_SIZE);
> +		size_t need = hdr_size;
> +

[ ... ]

> +		if (bytes_recvd < need) {
> +			drm_err_ratelimited(&hv->dev,
> +					    "synthvid packet too small for type %u: %u < %zu\n",
> +					    msg->vid_hdr.type, bytes_recvd, need);
> +			return;
> +		}
> +		memcpy(hv->init_buf, msg, bytes_recvd);

[Severity: High]
This isn't a bug introduced by this patch, but does the receive callback
overwrite hv->init_buf concurrently while process-context functions are using
it to construct outbound requests?

For example, if hyperv_get_supported_resolution() is building a request in
hv->init_buf, and a delayed response from a previous request arrives, this
memcpy() will overwrite the buffer.

[Severity: High]
This is a pre-existing issue, but does the wait thread unconditionally trust
that the message in hv->init_buf matches its expected response type?

Since this callback completes the wait for multiple message types, an unexpected
or delayed message could cause the waiting thread to awaken and parse overlapping
union fields incorrectly.

>  		complete(&hv->wait);

[Severity: High]
This is a pre-existing issue, but is the completion hv->wait ever
re-initialized before being reused?

If a request times out and the host responds later, complete() will be called,
leaving the completion signaled. The next request calling
wait_for_completion_timeout() will return immediately, leading to
desynchronization. Should callers use reinit_completion() before sending a
request?

>  		return;
>  	}

[ ... ]

> @@ -464,9 +519,9 @@ static void hyperv_receive(void *ctx)
>  		ret = vmbus_recvpacket(hdev->channel, recv_buf,
>  				       VMBUS_MAX_PACKET_SIZE,
>  				       &bytes_recvd, &req_id);
> -		if (bytes_recvd > 0 &&
> +		if (!ret && bytes_recvd > 0 &&

[Severity: High]
This is a pre-existing issue, but if vmbus_recvpacket() returns -ENOBUFS
because the host sends a packet larger than VMBUS_MAX_PACKET_SIZE, does the
channel become permanently wedged?

The loop condition will exit, and the oversized packet remains in the ring
buffer. Subsequent interrupts will continue to fail with -ENOBUFS, resulting
in a permanent channel stall.

>  		    recv_buf->pipe_hdr.type == PIPE_MSG_DATA)
> -			hyperv_receive_sub(hdev);
> +			hyperv_receive_sub(hdev, bytes_recvd);
>  	} while (bytes_recvd > 0 && ret == 0);
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1779396074.git.me@berkoc.com?part=2

^ permalink raw reply

* Re: [PATCH V3 09/11] x86/hyperv: Implement Hyper-V virtual IOMMU
From: Mukesh R @ 2026-05-21 22:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: hpa, robin.murphy, robh, wei.liu, mhklinux, muislam, namjain,
	magnuskulke, anbelski, linux-kernel, linux-hyperv, iommu,
	linux-pci, linux-arch, kys, haiyangz, decui, longli, tglx, mingo,
	bp, dave.hansen, x86, joro, will, lpieralisi, kwilczynski,
	bhelgaas, arnd, jacob.pan
In-Reply-To: <20260515182322.GI7702@ziepe.ca>

On 5/15/26 11:23, Jason Gunthorpe wrote:
> On Mon, May 11, 2026 at 07:02:57PM -0700, Mukesh R wrote:
>> +static struct iommu_domain *hv_iommu_domain_alloc_paging(struct device *dev)
>> +{
>> +	struct hv_domain *hvdom;
>> +	int rc;
>> +
>> +	if (hv_l1vh_partition() && !hv_curr_thread_is_vmm()) {
>> +		pr_err("Hyper-V: l1vh iommu does not support host devices\n");
>> +		return NULL;
>> +	}
>> +
>> +	hvdom = kzalloc(sizeof(struct hv_domain), GFP_KERNEL);
>> +	if (hvdom == NULL)
>> +		return NULL;
>> +
>> +	spin_lock_init(&hvdom->mappings_lock);
>> +	hvdom->mappings_tree = RB_ROOT_CACHED;
>> +
>> +	/* Called under iommu group mutex, so single threaded */
>> +	if (++unique_id == HV_DEVICE_DOMAIN_ID_S2_NULL)   /* ie, UINTMAX */
>> +		goto out_err;
>> +
>> +	hvdom->domid_num = unique_id;
>> +	hvdom->partid = hv_get_current_partid();
>> +	hvdom->iommu_dom.geometry = default_geometry;
>> +	hvdom->iommu_dom.pgsize_bitmap = HV_IOMMU_PGSIZES;
>> +
>> +	/* For guests, by default we do direct attaches, so no domain in hyp */
>> +	if (hv_dom_owner_is_vmm(hvdom) && !hv_no_attdev)
>> +		hvdom->attached_dom = true;
> 
> What are you thinking sending something like this?!?!?
> 
> The function is called *alloc domain PAGING*, it does not, and can not
> allocate weird "special" domains that are not PAGING domains. I just
> spent a long time removing all this kind of crazyness from drivers.
> 
> There is alot of other things I don't like in this patch, but this is
> too much.
> 
> You have to drop this "direct attach" idea from the first iteration,
> Linux can't do it without alot more work, you should start with the
> basic paging domain mode.
> 
> Jason

Yeah, agree. There was some ambivalence whether this could be a stop gap
solution until the iommufd based solution is fully designed.

I'll remove the "direct attach" stuff and resend with just basic paging
domain mode. Thanks for the review.

Thanks,
-Mukesh


^ permalink raw reply

* Re: [PATCH net-next v9] net: mana: Expose hardware diagnostic info via debugfs
From: Simon Horman @ 2026-05-22  7:12 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, kotaranov, shradhagupta, dipayanroy, kees,
	linux-hyperv, netdev, linux-kernel, linux-rdma
In-Reply-To: <20260519064621.772154-1-ernis@linux.microsoft.com>

On Mon, May 18, 2026 at 11:46:10PM -0700, Erni Sri Satya Vennela wrote:
> Add debugfs entries to expose hardware configuration and diagnostic
> information that aids in debugging driver initialization and runtime
> operations without adding noise to dmesg.
> 
> The debugfs directory for each PCI device is named using pci_name()
> (the unique BDF address), and its creation and removal is integrated
> into mana_gd_setup() and mana_gd_cleanup_device() respectively, so
> that all callers (probe, remove, suspend, resume, shutdown) share a
> single code path.
> 
> Device-level entries (under /sys/kernel/debug/mana/<BDF>/):
>   - num_msix_usable, max_num_queues: Max resources from hardware
>   - gdma_protocol_ver, pf_cap_flags1: VF version negotiation results
>   - num_vports, bm_hostmode: Device configuration
> 
> Per-vPort entries (under /sys/kernel/debug/mana/<BDF>/vportN/):
>   - port_handle: Hardware vPort handle
>   - max_sq, max_rq: Max queues from vPort config
>   - indir_table_sz: Indirection table size
>   - steer_rx, steer_rss, steer_update_tab, steer_cqe_coalescing:
>     Last applied steering configuration parameters
> 
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> ---
> Change in v9:
> * Change steer_update_tab type from u32 to bool and use
>   debugfs_create_bool() accordingly
> * Guard debugfs_lookup_and_remove() calls in mana_remove() with a
>   NULL check on gc->mana_pci_debugfs
> * Fix mana_gd_resume() RDMA failure unwind: call mana_rdma_remove()
>   to undo partial RDMA state and return err, instead of
>   mana_remove(true) + mana_gd_cleanup_device(), avoiding a UAF
>   where gf_stats_work could fire against an already-destroyed HWC

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [PATCH v2 0/5] treewide: Convert buses to use generic driver_override
From: Greg KH @ 2026-05-22 11:11 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: rafael, linux, nipun.gupta, nikhil.agarwal, kys, haiyangz,
	wei.liu, decui, longli, andersson, mathieu.poirier, driver-core,
	linux-kernel, linux-hyperv, linux-arm-msm, linux-remoteproc
In-Reply-To: <20260505133935.3772495-1-dakr@kernel.org>

On Tue, May 05, 2026 at 03:37:20PM +0200, Danilo Krummrich wrote:
> This is the follow-up of the driver_override generalization in [1], converting
> the remaining 4 busses and removing the now-unused driver_set_override() helper.
> 
> All of them are prone to the potential UAF described in [2], caused by accessing
> the driver_override field from their corresponding match() callback.
> 
> In order to address this, the generalized driver_override field in struct device
> is protected with a spinlock. The driver-core provides accessors, such as
> device_match_driver_override(), device_has_driver_override() and
> device_set_driver_override(), which all ensure proper locking internally.
> 
> Additionally, the driver-core provides a driver_override flag in struct
> bus_type, which, once enabled, automatically registers generic sysfs callbacks,
> allowing userspace to modify the driver_override field.
> 
> This series is based on v7.1-rc1 with no additional dependencies, hence those
> patches can be picked up by subsystems individually.
> 
> [1] https://lore.kernel.org/driver-core/20260303115720.48783-1-dakr@kernel.org/
> [2] https://bugzilla.kernel.org/show_bug.cgi?id=220789
> [3] https://gitlab.com/driverctl/driverctl/-/blob/0.121/driverctl?ref_type=tags#L99


Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply

* Re: [PATCH net-next v9] net: mana: Expose hardware diagnostic info via debugfs
From: patchwork-bot+netdevbpf @ 2026-05-22 18:20 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, kotaranov, horms, shradhagupta,
	dipayanroy, kees, linux-hyperv, netdev, linux-kernel, linux-rdma
In-Reply-To: <20260519064621.772154-1-ernis@linux.microsoft.com>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 18 May 2026 23:46:10 -0700 you wrote:
> Add debugfs entries to expose hardware configuration and diagnostic
> information that aids in debugging driver initialization and runtime
> operations without adding noise to dmesg.
> 
> The debugfs directory for each PCI device is named using pci_name()
> (the unique BDF address), and its creation and removal is integrated
> into mana_gd_setup() and mana_gd_cleanup_device() respectively, so
> that all callers (probe, remove, suspend, resume, shutdown) share a
> single code path.
> 
> [...]

Here is the summary with links:
  - [net-next,v9] net: mana: Expose hardware diagnostic info via debugfs
    https://git.kernel.org/netdev/net-next/c/c227f8aaf22c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v2] vsock: keep poll shutdown state consistent
From: patchwork-bot+netdevbpf @ 2026-05-22 18:30 UTC (permalink / raw)
  To: Ziyu Zhang
  Cc: sgarzare, davem, edumazet, kuba, pabeni, horms, acking,
	georgezhang, dtor, kys, haiyangz, wei.liu, decui, longli,
	stefanha, mst, jasowang, xuanzhuo, eperezma, bryan-bt.tan,
	vishnu.dasa, bcm-kernel-feedback-list, virtualization, netdev,
	linux-kernel, linux-hyperv, kvm, baijiaju1990, r33s3n6, gality369,
	zhenghaoran154, hanguidong02, zzzccc427
In-Reply-To: <20260519165636.62542-1-ziyuzhang201@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 20 May 2026 00:56:36 +0800 you wrote:
> vsock_poll() reads vsk->peer_shutdown before taking the socket lock
> to set EPOLLHUP and EPOLLRDHUP, then reads it again after taking
> the lock to report EOF readability. A shutdown packet can update
> peer_shutdown while poll is waiting for the lock, so one poll invocation
> can report EOF readability without the corresponding HUP/RDHUP bits.
> 
> For connectible sockets, take one peer_shutdown snapshot after
> lock_sock() and use it for all peer-shutdown-derived poll bits. For
> datagram sockets, which do not take lock_sock() in poll(), take one
> lockless READ_ONCE() snapshot and pair it with WRITE_ONCE() on the
> writer side.
> 
> [...]

Here is the summary with links:
  - [net,v2] vsock: keep poll shutdown state consistent
    https://git.kernel.org/netdev/net/c/aae9d8a5528b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox