All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
@ 2026-04-02 23:43 Dexuan Cui
  2026-04-05 23:15 ` Michael Kelley
  0 siblings, 1 reply; 10+ messages in thread
From: Dexuan Cui @ 2026-04-02 23:43 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, lpieralisi, kwilczynski,
	mani, robh, bhelgaas, jakeo, linux-hyperv, linux-pci,
	linux-kernel, mhklinux, matthew.ruffell, kjlx
  Cc: Krister Johansen, stable

There has been a longstanding MMIO conflict between the pci_hyperv
driver's config_window (see hv_allocate_config_window()) and the
hyperv_drm (or hyperv_fb) driver (see hyperv_setup_vram()): typically
both get MMIO from the low MMIO range below 4GB; this is not an issue
in the normal kernel since the VMBus driver reserves the framebuffer
MMIO range in vmbus_reserve_fb(), so the drm driver's hyperv_setup_vram()
can always get the reserved framebuffer MMIO; however, a Gen2 VM's
kdump kernel can fail to reserve the framebuffer MMIO in
vmbus_reserve_fb() because the screen_info.lfb_base is zero in the
kdump kernel due to several possible reasons (see the Link below for
more details):

1) on ARM64, the two syscalls (KEXEC_LOAD, KEXEC_FILE_LOAD) don't
initialize the screen_info.lfb_base for the kdump kernel;

2) on x86-64, the KEXEC_FILE_LOAD syscall initializes kdump kernel's
screen_info.lfb_base, but the KEXEC_LOAD syscall doesn't really do that
when the hyperv_drm driver loads, because the user-space kexec-tools
(i.e. the program 'kexec') doesn't recognize the hyperv_drm driver
(let's ignore the behavior of kexec-tools of very old versions).

When vmbus_reserve_fb() fails to reserve the framebuffer MMIO in the
kdump kernel, if pci_hyperv in the kdump kernel loads before hyperv_drm
loads, pci_hyperv's vmbus_allocate_mmio() gets the framebuffer MMIO
and tries to use it, but since the host thinks that the MMIO range is
still in use by hyperv_drm, the host refuses to accept the MMIO range
as the config window, and pci_hyperv's hv_pci_enter_d0() errors out,
e.g. an error can be "PCI Pass-through VSP failed D0 Entry with status
c0370048".

Typically, this pci_hyperv error in the kdump kernel was not fatal in
the past because the kdump kernel typically doesn't rely on pci_hyperv,
i.e. the root file system is on a VMBus SCSI device.

Now, a VM on Azure can boot from NVMe, i.e. the root file system can be
on a NVMe device, which depends on pci_hyperv. When the error occurs,
the kdump kernel fails to boot up since no root file system is detected.

Fix the MMIO conflict by allocating MMIO above 4GB for the config_window,
so it won't conflict with hyperv_drm's MMIO, which should be below the
4GB boundary. The size of config_window is small: it's only 8KB per PCI
device, so there should be sufficient MMIO space available above 4GB.

Note: we still need to figure out how to address the possible MMIO
conflict between hyperv_drm and pci_hyperv in the case of 32-bit PCI
MMIO BARs, but that's of low priority because all PCI devices available
to a Linux VM on Azure or on a modern host should use 64-bit BARs and
should not use 32-bit BARs -- I checked Mellanox VFs, MANA VFs, NVMe
devices, and GPUs in Linux VMs on Azure, and found no 32-bit BARs.

Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Link: https://lore.kernel.org/all/SA1PR21MB692176C1BC53BFC9EAE5CF8EBF51A@SA1PR21MB6921.namprd21.prod.outlook.com/
Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com>
Tested-by: Krister Johansen <johansen@templeofstupid.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: stable@vger.kernel.org
---

Changes since v1:
    Updated the commit message and the comment to better explain
    why screen_info.lfb_base can be 0 in the kdump kernel.

    No code change since v1.


 drivers/pci/controller/pci-hyperv.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 2c7a406b4ba8..1a79334ea9f4 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3403,9 +3403,26 @@ static int hv_allocate_config_window(struct hv_pcibus_device *hbus)
 
 	/*
 	 * Set up a region of MMIO space to use for accessing configuration
-	 * space.
+	 * space. Use the high MMIO range to not conflict with the hyperv_drm
+	 * driver (which normally gets MMIO from the low MMIO range) in the
+	 * kdump kernel of a Gen2 VM, which may fail to reserve the framebuffer
+	 * MMIO range in vmbus_reserve_fb() due to screen_info.lfb_base being
+	 * zero in the kdump kernel:
+	 *
+	 * on ARM64, the two syscalls (KEXEC_LOAD, KEXEC_FILE_LOAD) don't
+	 * initialize the screen_info.lfb_base for the kdump kernel;
+	 *
+	 * on x86-64, the KEXEC_FILE_LOAD syscall initializes kdump kernel's
+	 * screen_info.lfb_base (see bzImage64_load() -> setup_boot_parameters())
+	 * but the KEXEC_LOAD syscall doesn't really do that when the hyperv_drm
+	 * driver loads, because the user-space program 'kexec' doesn't
+	 * recognize hyperv_drm: see the function setup_linux_vesafb() in the
+	 * kexec-tools.git repo. Note: old versions of kexec-tools, e.g.
+	 * v2.0.18, initialize screen_info.lfb_base if the hyperv_fb driver
+	 * loads, but hyperv_fb is deprecated and has been removed from the
+	 * mainline kernel.
 	 */
-	ret = vmbus_allocate_mmio(&hbus->mem_config, hbus->hdev, 0, -1,
+	ret = vmbus_allocate_mmio(&hbus->mem_config, hbus->hdev, SZ_4G, -1,
 				  PCI_CONFIG_MMIO_LENGTH, 0x1000, false);
 	if (ret)
 		return ret;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
  2026-04-02 23:43 [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window Dexuan Cui
@ 2026-04-05 23:15 ` Michael Kelley
  2026-04-08  9:24   ` Dexuan Cui
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Kelley @ 2026-04-05 23:15 UTC (permalink / raw)
  To: Dexuan Cui, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, longli@microsoft.com, lpieralisi@kernel.org,
	kwilczynski@kernel.org, mani@kernel.org, robh@kernel.org,
	bhelgaas@google.com, jakeo@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
  Cc: Krister Johansen, stable@vger.kernel.org

From: Dexuan Cui <decui@microsoft.com> Sent: Thursday, April 2, 2026 4:43 PM
> 
> There has been a longstanding MMIO conflict between the pci_hyperv
> driver's config_window (see hv_allocate_config_window()) and the
> hyperv_drm (or hyperv_fb) driver (see hyperv_setup_vram()): typically
> both get MMIO from the low MMIO range below 4GB; this is not an issue
> in the normal kernel since the VMBus driver reserves the framebuffer
> MMIO range in vmbus_reserve_fb(), so the drm driver's hyperv_setup_vram()
> can always get the reserved framebuffer MMIO; however, a Gen2 VM's
> kdump kernel can fail to reserve the framebuffer MMIO in
> vmbus_reserve_fb() because the screen_info.lfb_base is zero in the
> kdump kernel due to several possible reasons (see the Link below for
> more details):
> 
> 1) on ARM64, the two syscalls (KEXEC_LOAD, KEXEC_FILE_LOAD) don't
> initialize the screen_info.lfb_base for the kdump kernel;
> 
> 2) on x86-64, the KEXEC_FILE_LOAD syscall initializes kdump kernel's
> screen_info.lfb_base, but the KEXEC_LOAD syscall doesn't really do that
> when the hyperv_drm driver loads, because the user-space kexec-tools
> (i.e. the program 'kexec') doesn't recognize the hyperv_drm driver
> (let's ignore the behavior of kexec-tools of very old versions).
> 
> When vmbus_reserve_fb() fails to reserve the framebuffer MMIO in the
> kdump kernel, if pci_hyperv in the kdump kernel loads before hyperv_drm
> loads, pci_hyperv's vmbus_allocate_mmio() gets the framebuffer MMIO
> and tries to use it, but since the host thinks that the MMIO range is
> still in use by hyperv_drm, the host refuses to accept the MMIO range
> as the config window, and pci_hyperv's hv_pci_enter_d0() errors out,
> e.g. an error can be "PCI Pass-through VSP failed D0 Entry with status
> c0370048".
> 
> Typically, this pci_hyperv error in the kdump kernel was not fatal in
> the past because the kdump kernel typically doesn't rely on pci_hyperv,
> i.e. the root file system is on a VMBus SCSI device.
> 
> Now, a VM on Azure can boot from NVMe, i.e. the root file system can be
> on a NVMe device, which depends on pci_hyperv. When the error occurs,
> the kdump kernel fails to boot up since no root file system is detected.
> 
> Fix the MMIO conflict by allocating MMIO above 4GB for the config_window,
> so it won't conflict with hyperv_drm's MMIO, which should be below the
> 4GB boundary. The size of config_window is small: it's only 8KB per PCI
> device, so there should be sufficient MMIO space available above 4GB.
> 
> Note: we still need to figure out how to address the possible MMIO
> conflict between hyperv_drm and pci_hyperv in the case of 32-bit PCI
> MMIO BARs, but that's of low priority because all PCI devices available
> to a Linux VM on Azure or on a modern host should use 64-bit BARs and
> should not use 32-bit BARs -- I checked Mellanox VFs, MANA VFs, NVMe
> devices, and GPUs in Linux VMs on Azure, and found no 32-bit BARs.

Just to clarify, since this patch is predicated on all BARs being 64-bit,
hv_pci_alloc_bridge_windows() never encounters a non-zero
hbus->low_mmio_space, and hence also never allocates from low
MMIO space. So hv_pci_alloc_bridge_windows() does not need to be
patched. Is that correct?

Taking a broader view, fundamentally the current MMIO location of
the frame buffer may be unknown to the Linux guest. At the same time,
Linux must ensure that PCI devices don't get assigned to the MMIO space
where the frame buffer is located. While the current MMIO location of
the frame buffer may be unknown, we can assume it was placed in low
MMIO space by the host -- either Windows Hyper-V or Linux/VMM
in the root partition, and perhaps as mediated by a paravisor. Probably
need to confirm with the Linux-in-the-root partition team (and maybe
the OpenHCL team) that this assumption is true. Presumably the
hyperv_drm driver doesn't need to move the frame buffer, but if it
does, it must stay in the low MMIO space.

This patch depends on this assumption, and effectively reserves
the entire low MMIO space for the frame buffer. The low MMIO space
size defaults to 128 MiB on a local Hyper-V, and is set to 3 GiB in most
Azure VMs (or to 1 GiB in an Azure CVM), so that all gets reserved.

A slightly different approach to the whole problem is to change
vmbus_reserve_fb(). If it is unable to get a non-zero "start" value, then
it should use the same assumption as above, and reserve a frame buffer
area starting at the lowest address in low MMIO space. The reserved size
could be the max possible frame buffer size, which I think is 64 MiB (?).
This still leaves low MMIO space for subsequent PCI devices, and allows
32-bit BARs to continue to work. This approach requires one further
assumption, which is that the host, plus any movement by hyperv_drm,
has kept the frame buffer at the low end of the low MMIO space. From
what I've seen, that assumption is reality -- the frame buffer always
starts at the beginning of low MMIO space.

This approach could be taken one step further, where vmbus_reserve_fb()
*always* reserves 64 MiB starting at the low end of low MMIO space,
regardless of the value of "start". The messy code for getting "start"
could be dropped entirely, and the dependency on CONFIG_SYSFB goes
away. Or maybe still get the value of "start" and "size", and if non-zero
just do a sanity check that they are within the fixed 64 MiB reserved area.

Thoughts? To me tweaking vmbus_reserve_fb() is a more
straightforward and explicit way to do the reserving, vs. modifying
the requested range in the Hyper-V PCI driver. And FWIW, it avoids
introducing the 32-bit BAR limitation.

Michael

> 
> Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
> Link: https://lore.kernel.org/all/SA1PR21MB692176C1BC53BFC9EAE5CF8EBF51A@SA1PR21MB6921.namprd21.prod.outlook.com/
> Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com>
> Tested-by: Krister Johansen <johansen@templeofstupid.com>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Cc: stable@vger.kernel.org
> ---
> 
> Changes since v1:
>     Updated the commit message and the comment to better explain
>     why screen_info.lfb_base can be 0 in the kdump kernel.
> 
>     No code change since v1.
> 
> 
>  drivers/pci/controller/pci-hyperv.c | 21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 2c7a406b4ba8..1a79334ea9f4 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -3403,9 +3403,26 @@ static int hv_allocate_config_window(struct
> hv_pcibus_device *hbus)
> 
>  	/*
>  	 * Set up a region of MMIO space to use for accessing configuration
> -	 * space.
> +	 * space. Use the high MMIO range to not conflict with the hyperv_drm
> +	 * driver (which normally gets MMIO from the low MMIO range) in the
> +	 * kdump kernel of a Gen2 VM, which may fail to reserve the framebuffer
> +	 * MMIO range in vmbus_reserve_fb() due to screen_info.lfb_base being
> +	 * zero in the kdump kernel:
> +	 *
> +	 * on ARM64, the two syscalls (KEXEC_LOAD, KEXEC_FILE_LOAD) don't
> +	 * initialize the screen_info.lfb_base for the kdump kernel;
> +	 *
> +	 * on x86-64, the KEXEC_FILE_LOAD syscall initializes kdump kernel's
> +	 * screen_info.lfb_base (see bzImage64_load() -> setup_boot_parameters())
> +	 * but the KEXEC_LOAD syscall doesn't really do that when the hyperv_drm
> +	 * driver loads, because the user-space program 'kexec' doesn't
> +	 * recognize hyperv_drm: see the function setup_linux_vesafb() in the
> +	 * kexec-tools.git repo. Note: old versions of kexec-tools, e.g.
> +	 * v2.0.18, initialize screen_info.lfb_base if the hyperv_fb driver
> +	 * loads, but hyperv_fb is deprecated and has been removed from the
> +	 * mainline kernel.
>  	 */
> -	ret = vmbus_allocate_mmio(&hbus->mem_config, hbus->hdev, 0, -1,
> +	ret = vmbus_allocate_mmio(&hbus->mem_config, hbus->hdev, SZ_4G, -1,
>  				  PCI_CONFIG_MMIO_LENGTH, 0x1000, false);
>  	if (ret)
>  		return ret;
> --
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
  2026-04-05 23:15 ` Michael Kelley
@ 2026-04-08  9:24   ` Dexuan Cui
  2026-04-08 13:53     ` Michael Kelley
  2026-04-16 18:49     ` Dexuan Cui
  0 siblings, 2 replies; 10+ messages in thread
From: Dexuan Cui @ 2026-04-08  9:24 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
	Long Li, lpieralisi@kernel.org, kwilczynski@kernel.org,
	mani@kernel.org, robh@kernel.org, bhelgaas@google.com,
	Jake Oshins, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
  Cc: Krister Johansen, stable@vger.kernel.org

> From: Michael Kelley <mhklinux@outlook.com>
> Sent: Sunday, April 5, 2026 4:15 PM
> > ...
> > Note: we still need to figure out how to address the possible MMIO
> > conflict between hyperv_drm and pci_hyperv in the case of 32-bit PCI
> > MMIO BARs, but that's of low priority because all PCI devices available
> > to a Linux VM on Azure or on a modern host should use 64-bit BARs and
> > should not use 32-bit BARs -- I checked Mellanox VFs, MANA VFs, NVMe
> > devices, and GPUs in Linux VMs on Azure, and found no 32-bit BARs.
> 
> Just to clarify, since this patch is predicated on all BARs being 64-bit,
> hv_pci_alloc_bridge_windows() never encounters a non-zero
> hbus->low_mmio_space, and hence also never allocates from low
> MMIO space. So hv_pci_alloc_bridge_windows() does not need to be
> patched. Is that correct?

Correct. For 32-bit BARs (if any), IMO we can't really do anything for
them in hv_pci_allocate_bridge_windows(), since they must reside
below 4GB.

Note: while the patch doesn't fix the MMIO conflict if there are any
32-bit BARs, the patch doesn't make things worse for 32-bit BARs (if any).

> Taking a broader view, fundamentally the current MMIO location of
> the frame buffer may be unknown to the Linux guest. At the same time,
> Linux must ensure that PCI devices don't get assigned to the MMIO space
> where the frame buffer is located. While the current MMIO location of
> the frame buffer may be unknown, we can assume it was placed in low
> MMIO space by the host -- either Windows Hyper-V or Linux/VMM
> in the root partition, and perhaps as mediated by a paravisor. Probably
> need to confirm with the Linux-in-the-root partition team (and maybe
> the OpenHCL team) that this assumption is true. 

IMO this is a good idea! It looks like the framebuffer base always starts
at the beginning of the low MMIO space. We can reserve some
MMIO for the framebuffer at the beginning of the low MMIO space.

> Presumably the
> hyperv_drm driver doesn't need to move the frame buffer, but if it
> does, it must stay in the low MMIO space.

It looks like this assumption is true.

> This patch depends on this assumption, and effectively reserves
> the entire low MMIO space for the frame buffer. 

To make it precise, the patch reserves the entire low MMIO space for
the frame buffer and the 32-bit BARs (if any), and there is no MMIO
conflict in the first kernel (assuming hyperv_drm doesn't relocate the
MMIO range), and there can be an MMIO conflict in the
kdump/kexec kernel if there is any 32-bit BAR.

> The low MMIO space
> size defaults to 128 MiB on a local Hyper-V, 
Yes, by default, the low MMIO base =0xf800_0000, size=128MB, 
but the range [0xfed4_0000, 0xffff_ffff], whose size is 18.75MB,
is reserved for vTPM: see vmbus_walk_resources(). So by default
the available low MMIO size for hyperv_drm is 128 - 18.75 = 
109.25 MB.

The size of the framebuffer should be aligned to 2MB, so if the
framebuffer size is bigger than 108MB, it looks like there is no
enough MMIO space in the low MMIO range, e.g. with the below
command:
Set-VMVideo -VMName vm_name -HorizontalResolution 7680
-VerticalResolution 4320 -ResolutionType Maximum
, the resulting max framebuffer size is 
7680 * 4320 * 32/8 /1024.0/1024 = 126.5625, which would be
rounded up to 128MB.

However, according to my testing, with the above command,
the low MMIO base = 0xf000_0000, size=256MB, so it's probably
ok to reserve 128 MB for the frame buffer. 

In case the low MMIO size is <=64MB, we would want to reserve
less MMIO for the frame buffer.

> and is set to 3 GiB in most
> Azure VMs (or to 1 GiB in an Azure CVM), so that all gets reserved.
> 
> A slightly different approach to the whole problem is to change
> vmbus_reserve_fb(). If it is unable to get a non-zero "start" value, then
> it should use the same assumption as above, and reserve a frame buffer
> area starting at the lowest address in low MMIO space. The reserved size
> could be the max possible frame buffer size, which I think is 64 MiB (?).

It can be 128MB with the highest resolution 7680*4320 (I hope the
highest resolution won't become bigger in the future).

> This still leaves low MMIO space for subsequent PCI devices, and allows
> 32-bit BARs to continue to work. This approach requires one further
> assumption, which is that the host, plus any movement by hyperv_drm,
> has kept the frame buffer at the low end of the low MMIO space. From
> what I've seen, that assumption is reality -- the frame buffer always
> starts at the beginning of low MMIO space.
> 
> This approach could be taken one step further, where vmbus_reserve_fb()
> *always* reserves 64 MiB starting at the low end of low MMIO space,
> regardless of the value of "start". The messy code for getting "start"
> could be dropped entirely, and the dependency on CONFIG_SYSFB goes
> away. Or maybe still get the value of "start" and "size", and if non-zero
> just do a sanity check that they are within the fixed 64 MiB reserved area.
> 
> Thoughts? To me tweaking vmbus_reserve_fb() is a more
> straightforward and explicit way to do the reserving, vs. modifying
> the requested range in the Hyper-V PCI driver. 

Agreed. Let me try to make a new patch for review.

> And FWIW, it avoids  introducing the 32-bit BAR limitation.

This patch addresses the MMIO conflict for 64-bit BARs and not for
32-bit BARs (if any). The patch does not introduce the 32-bit BAR limitation.

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
  2026-04-08  9:24   ` Dexuan Cui
@ 2026-04-08 13:53     ` Michael Kelley
  2026-04-15 15:30       ` Dexuan Cui
  2026-04-16 18:49     ` Dexuan Cui
  1 sibling, 1 reply; 10+ messages in thread
From: Michael Kelley @ 2026-04-08 13:53 UTC (permalink / raw)
  To: Dexuan Cui, Michael Kelley, KY Srinivasan, Haiyang Zhang,
	wei.liu@kernel.org, Long Li, lpieralisi@kernel.org,
	kwilczynski@kernel.org, mani@kernel.org, robh@kernel.org,
	bhelgaas@google.com, Jake Oshins, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
  Cc: Krister Johansen, stable@vger.kernel.org

From: Dexuan Cui <DECUI@microsoft.com> Sent: Wednesday, April 8, 2026 2:24 AM
> 
> > From: Michael Kelley <mhklinux@outlook.com>
> > Sent: Sunday, April 5, 2026 4:15 PM
> > > ...
> > > Note: we still need to figure out how to address the possible MMIO
> > > conflict between hyperv_drm and pci_hyperv in the case of 32-bit PCI
> > > MMIO BARs, but that's of low priority because all PCI devices available
> > > to a Linux VM on Azure or on a modern host should use 64-bit BARs and
> > > should not use 32-bit BARs -- I checked Mellanox VFs, MANA VFs, NVMe
> > > devices, and GPUs in Linux VMs on Azure, and found no 32-bit BARs.
> >
> > Just to clarify, since this patch is predicated on all BARs being 64-bit,
> > hv_pci_alloc_bridge_windows() never encounters a non-zero
> > hbus->low_mmio_space, and hence also never allocates from low
> > MMIO space. So hv_pci_alloc_bridge_windows() does not need to be
> > patched. Is that correct?
> 
> Correct. For 32-bit BARs (if any), IMO we can't really do anything for
> them in hv_pci_allocate_bridge_windows(), since they must reside
> below 4GB.
> 
> Note: while the patch doesn't fix the MMIO conflict if there are any
> 32-bit BARs, the patch doesn't make things worse for 32-bit BARs (if any).

OK, right. Your patch doesn't prevent 32-bit BARs from working. It
just doesn't fix any potential frame buffer conflicts with 32-bit BARs.
I misinterpreted the situation.

> 
> > Taking a broader view, fundamentally the current MMIO location of
> > the frame buffer may be unknown to the Linux guest. At the same time,
> > Linux must ensure that PCI devices don't get assigned to the MMIO space
> > where the frame buffer is located. While the current MMIO location of
> > the frame buffer may be unknown, we can assume it was placed in low
> > MMIO space by the host -- either Windows Hyper-V or Linux/VMM
> > in the root partition, and perhaps as mediated by a paravisor. Probably
> > need to confirm with the Linux-in-the-root partition team (and maybe
> > the OpenHCL team) that this assumption is true.
> 
> IMO this is a good idea! It looks like the framebuffer base always starts
> at the beginning of the low MMIO space. We can reserve some
> MMIO for the framebuffer at the beginning of the low MMIO space.
> 
> > Presumably the
> > hyperv_drm driver doesn't need to move the frame buffer, but if it
> > does, it must stay in the low MMIO space.
> 
> It looks like this assumption is true.
> 
> > This patch depends on this assumption, and effectively reserves
> > the entire low MMIO space for the frame buffer.
> 
> To make it precise, the patch reserves the entire low MMIO space for
> the frame buffer and the 32-bit BARs (if any), and there is no MMIO
> conflict in the first kernel (assuming hyperv_drm doesn't relocate the
> MMIO range), and there can be an MMIO conflict in the
> kdump/kexec kernel if there is any 32-bit BAR.
> 
> > The low MMIO space
> > size defaults to 128 MiB on a local Hyper-V,
> Yes, by default, the low MMIO base =0xf800_0000, size=128MB,
> but the range [0xfed4_0000, 0xffff_ffff], whose size is 18.75MB,
> is reserved for vTPM: see vmbus_walk_resources(). So by default
> the available low MMIO size for hyperv_drm is 128 - 18.75 =
> 109.25 MB.
> 
> The size of the framebuffer should be aligned to 2MB, so if the
> framebuffer size is bigger than 108MB, it looks like there is no
> enough MMIO space in the low MMIO range, e.g. with the below
> command:
> Set-VMVideo -VMName vm_name -HorizontalResolution 7680
> -VerticalResolution 4320 -ResolutionType Maximum
> , the resulting max framebuffer size is
> 7680 * 4320 * 32/8 /1024.0/1024 = 126.5625, which would be
> rounded up to 128MB.
> 
> However, according to my testing, with the above command,
> the low MMIO base = 0xf000_0000, size=256MB, so it's probably
> ok to reserve 128 MB for the frame buffer.
> 
> In case the low MMIO size is <=64MB, we would want to reserve
> less MMIO for the frame buffer.
> 
> > and is set to 3 GiB in most
> > Azure VMs (or to 1 GiB in an Azure CVM), so that all gets reserved.
> >
> > A slightly different approach to the whole problem is to change
> > vmbus_reserve_fb(). If it is unable to get a non-zero "start" value, then
> > it should use the same assumption as above, and reserve a frame buffer
> > area starting at the lowest address in low MMIO space. The reserved size
> > could be the max possible frame buffer size, which I think is 64 MiB (?).
> 
> It can be 128MB with the highest resolution 7680*4320 (I hope the
> highest resolution won't become bigger in the future).

Indeed!

> 
> > This still leaves low MMIO space for subsequent PCI devices, and allows
> > 32-bit BARs to continue to work. This approach requires one further
> > assumption, which is that the host, plus any movement by hyperv_drm,
> > has kept the frame buffer at the low end of the low MMIO space. From
> > what I've seen, that assumption is reality -- the frame buffer always
> > starts at the beginning of low MMIO space.
> >
> > This approach could be taken one step further, where vmbus_reserve_fb()
> > *always* reserves 64 MiB starting at the low end of low MMIO space,
> > regardless of the value of "start". The messy code for getting "start"
> > could be dropped entirely, and the dependency on CONFIG_SYSFB goes
> > away. Or maybe still get the value of "start" and "size", and if non-zero
> > just do a sanity check that they are within the fixed 64 MiB reserved area.
> >
> > Thoughts? To me tweaking vmbus_reserve_fb() is a more
> > straightforward and explicit way to do the reserving, vs. modifying
> > the requested range in the Hyper-V PCI driver.
> 
> Agreed. Let me try to make a new patch for review.
> 
> > And FWIW, it avoids  introducing the 32-bit BAR limitation.
> 
> This patch addresses the MMIO conflict for 64-bit BARs and not for
> 32-bit BARs (if any). The patch does not introduce the 32-bit BAR limitation.

Right.  I misinterpreted the problem you mentioned about 32-bit BARs.

Michael

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
  2026-04-08 13:53     ` Michael Kelley
@ 2026-04-15 15:30       ` Dexuan Cui
  2026-04-15 16:46         ` Dexuan Cui
  2026-04-23 17:40         ` Michael Kelley
  0 siblings, 2 replies; 10+ messages in thread
From: Dexuan Cui @ 2026-04-15 15:30 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
	Long Li, lpieralisi@kernel.org, kwilczynski@kernel.org,
	mani@kernel.org, robh@kernel.org, bhelgaas@google.com,
	Jake Oshins, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
  Cc: Krister Johansen, stable@vger.kernel.org

> From: Michael Kelley <mhklinux@outlook.com>
> Sent: Wednesday, April 8, 2026 6:54 AM
> > > ...
> > > A slightly different approach to the whole problem is to change
> > > vmbus_reserve_fb(). If it is unable to get a non-zero "start" value, then
> > > it should use the same assumption as above, and reserve a frame buffer
> > > area starting at the lowest address in low MMIO space. The reserved size

The framebuffer base of Gen1 VMs always starts at 4GB-128MB, even if
the low mmio base is 1GB.

> > > could be the max possible frame buffer size, which I think is 64 MiB (?).
> >
> > It can be 128MB with the highest resolution 7680*4320 (I hope the
> > highest resolution won't become bigger in the future).
> 
> Indeed!
> 
> >
> > > This still leaves low MMIO space for subsequent PCI devices, and allows
> > > 32-bit BARs to continue to work. This approach requires one further
> > > assumption, which is that the host, plus any movement by hyperv_drm,
> > > has kept the frame buffer at the low end of the low MMIO space. From
> > > what I've seen, that assumption is reality -- the frame buffer always
> > > starts at the beginning of low MMIO space.
> > >
> > > This approach could be taken one step further, where vmbus_reserve_fb()
> > > *always* reserves 64 MiB starting at the low end of low MMIO space,
> > > regardless of the value of "start". The messy code for getting "start"
> > > could be dropped entirely, and the dependency on CONFIG_SYSFB goes
> > > away. Or maybe still get the value of "start" and "size", and if non-zero
> > > just do a sanity check that they are within the fixed 64 MiB reserved area.
> > >
> > > Thoughts? To me tweaking vmbus_reserve_fb() is a more
> > > straightforward and explicit way to do the reserving, vs. modifying
> > > the requested range in the Hyper-V PCI driver.
> >
> > Agreed. Let me try to make a new patch for review.

Please refer to my testing results and my thoughts below:


On x86-64 lab hosts, I tested Gen1 and Gen2 VMs on the latest
Hyper-V build, and on Windows Server 2019
(Hyper-V: Hypervisor Build 10.0.17763.8510-8-0), and I saw the
same host behavior on both the hosts:

1) The max required framebuffer size is determined by Set-VMVideo,
   and is reported to the guest hyperv_drm driver via
   hdev->channel->offermsg.offer.mmio_megabytes.

   1.1) For Gen1 VMs, the framebuffer's base is reported via the
        legacy PCI graphics device's BAR: the PCI BAR's base is
        hardcoded to 4G-128MB, and the size is hardcoded to 64MB,
        but the hyperv_drm driver can use a framebuffer size bigger
        than 64MB when Set-VMVideo specifies a big framebuffer.

   1.2) For Gen2 VMs, the framebuffer's base is reported via the
        UEFI firmware, and the size is hardcoded to 3MB, but the
        hyperv_drm driver can use a framebuffer size bigger than
        64MB when Set-VMVideo specifies a big framebuffer.

2) The low mmio range is affected by the PowerShell command
   "Set-VM -LowMemoryMappedIoSpace". Note: the command only accepts
   a value between 128MB and 3.5GB.

3) For Gen2 VMs, the low mmio range is also affected by another
   command "Set-VMVideo", and the framebuffer always starts at the
   beginning of the low mmio range.

   3.1) By default, both the low mmio range and the framebuffer
        start at the fixed location 4G-128MB. If the max
        framebuffer size is X MB bigger than 64MB, the
        low_mmio_base decreases by 2*X MB.

   3.2) With "Set-VM -LowMemoryMappedIoSpace 1GB", the
        low_mmio_base is 3GB, the low_mmio_size=1GB. The
        fb_mmio_base is also 3GB; if the max framebuffer size is
        X MB bigger than 64MB, the low_mmio_base decreases by
        2*X MB.

4) For Gen1 VMs, the framebuffer always starts at the fixed
   location 4G-128MB.

   4.1) By default, the low mmio range also starts at 4G-128MB,
        and the size is 127.75 MB, i.e. if
        hdev->channel->offermsg.offer.mmio_megabytes needs 128MB,
        the guest hyperv_drm driver can't find enough available
        mmio in the low mmio range, and has to use the high mmio
        range.

   4.2) With "Set-VM -LowMemoryMappedIoSpace 1GB", the
        low_mmio_base is 3GB, the low_mmio_size=1023.75 MB. The
        fb_mmio_base is still 4G-128MB, i.e. if hyperv_drm needs
        128 MB of mmio, it still has to use the high mmio range.

5) Note: the mmio range [VTPM_BASE_ADDRESS, 4GB), whose size is
   18.75MB, can not be used by the framebuffer.

To recap, according to my testing, the pseudo code of the
host/guest firmware that determine the low mmio range and the
framebuffer range should be:

max_fb_size = round_up_to_2MB(HorizontalResolution *
                              VerticalResolution * 4);

if (is_gen1_VM) {
    low_mmio_base = 4G - 128MB
    fb_mmio_base = 4G - 128MB
    low_mmio_size = 128MB - 0.25MB
} else { /* Gen2 VMs */
    low_mmio_base = 4G - 128MB
    low_mmio_size = 128MB

    excess_fb_size = (max_fb_size > 64MB) ?
                     (max_fb_size - 64MB) : 0;
    low_mmio_base -= excess_fb_size * 2;
    low_mmio_size = 4GB - low_mmio_base
    fb_mmio_base = low_mmio_base;
}

If ("Set-VM -LowMemoryMappedIoSpace" sets a target_low_mmio_size) {
    target_low_mmio_size = round_up_to_2MB(target_low_mmio_size)

    if (4GB - target_low_mmio_size < low_mmio_base) {
        low_mmio_base = 4GB - target_low_mmio_size

        if (is_gen1_VM) {
            low_mmio_size = target_low_mmio_size - 0.25MB
            // fb_mmio_base is still 4GB - 128MB
        } else {
            low_mmio_size = target_low_mmio_size
            fb_mmio_base = low_mmio_base;
        }
    }
}

e.g. for a Gen2 VM with the below commands:
   Set-VM -LowMemoryMappedIoSpace 128MB \
          -VMName decui-u2204-gen2-fb
   // i.e. the default setting on a lab host
   Set-VMVideo -VMName decui-u2204-gen2-fb \
               -HorizontalResolution 4834 \
               -VerticalResolution 3622 \
               -ResolutionType Single
we have:
    max_fb_size = round_up_to_2MB(4834*3622*4) = 68 MB
    excess_fb_size = 4MB
    low_mmio_base = 4GB - 128MB - 4MB * 2
                  = 4GB - 136 MB = 0xf7800000
    fb_mmio_base = low_mmio_base
    low_mmio_size = 4GB - low_mmio_base = 136MB

    In this case, we'd like to reserve low_mmio_size/2 = 68MB
    (rather than a fixed value of 128MB) for the framebuffer mmio:
    actually we can't reserve 128MB from the low mmio range,
    because the range [VTPM_BASE_ADDRESS, 4GB), whose size is
    18.75MB, is reserved for vTPM and other system devices like
    the I/O APIC, so the available low mmio size is only
    136MB - 18.75MB = 117.25MB.

    If we further run
    "Set-VM -LowMemoryMappedIoSpace 150MB \
     -VMName decui-u2204-gen2-fb", we have
    max_fb_size = round_up_to_2MB(4834*3622*4) = 68 MB
    excess_fb_size = 4MB
    low_mmio_base = 4GB - 128MB - 4MB * 2
                  = 4GB - 136 MB = 0xf7800000
    but 4GB - target_low_mmio_size = 4GB - 150MB, which is
    smaller than low_mmio_base, so low_mmio_base and
    fb_mmio_base are both set to 4GB - 150MB = 0xf6a00000,
    and low_mmio_size = 150MB. In this case, we'd like to
    reserve low_mmio_size/2 = 75MB for the framebuffer mmio,
    since we don't know the exact framebuffer size in
    vmbus_reserve_fb().

    With the same PowerShell commands, if the VM is a Gen1 VM,
    the low_mmio_base = 0xf6a00000, and
    low_mmio_size = 149.75MB but the fb_mmio_base is
    4GB - 128MB = 0xf8000000.

Another example is: for a Gen2 VM with the below commands:
   Set-VM -LowMemoryMappedIoSpace 1GB \
          -VMName decui-u2204-gen2-fb
   // i.e. the default setting on Azure. Let's ignore CVMs here.
   Set-VMVideo -VMName decui-u2204-gen2-fb \
               -HorizontalResolution 4834 \
               -VerticalResolution 3622 \
               -ResolutionType Single
we have:
    max_fb_size = round_up_to_2MB(4834*3622*4) = 68 MB
    excess_fb_size = 4MB
    low_mmio_base = 4GB - 128MB - 4MB * 2
                  = 4GB - 136 MB = 0xf7800000
    but 4GB - target_low_mmio_size = 4GB - 1GB, which is
    smaller than low_mmio_base, so low_mmio_base and
    fb_mmio_base are both set to 4GB - 1GB = 0xc0000000,
    and low_mmio_size = 1GB.
    In this case, we'd like to reserve
    min(low_mmio_size/2, 128MB) = 128MB for the framebuffer
    mmio, since the max possible framebuffer so far is 128MB.

************************************

On an ARM64 lab host, I also tested Gen2 VMs (there is no Gen1 VM
for ARM VMs):

By default:
  low_mmio_base = 4GB - 512MB, i.e. 0xe0000000
  low_mmio_size = 512MB
  fb_mmio_base = low_mmio_base
  The default framebuffer size is 3MB
  (i.e. screen.lfb_size = 3MB) but hyperv_drm:
  mmio_megabytes = 8 MB, which supports up to 1920 * 1080.

With the below commands:
   Set-VM -LowMemoryMappedIoSpace 512MB \
          -VMName decui-u2204-gen2-fb
   // the command only accepts a value between 512MB and 3.5GB.
   Set-VMVideo -VMName decui-u2204-gen2-fb \
               -HorizontalResolution 4834 \
               -VerticalResolution 3622 \
               -ResolutionType Single
I thought we would have:
    max_fb_size = round_up_to_2MB(4834*3622*4) = 68 MB
    excess_fb_size = 4MB
    low_mmio_base = 4GB - 512MB - 4MB * 2
                  = 4GB - 520MB
    fb_mmio_base = low_mmio_base
    low_mmio_size = 4GB - low_mmio_base = 520MB

    Since 4GB - target_low_mmio_size = 4GB - 512MB, which is
    smaller than low_mmio_base, so low_mmio_base and
    fb_mmio_base would be both set to 4GB - 520MB, and
    low_mmio_size would be 520MB.

    However, the actual result is:
    max_fb_size is indeed 68MB.
    but fb_mmio_base = low_mmio_base = 4GB - 512MB, and
    low_mmio_size = 512MB, i.e. the 'excess_fb_size' is not
    considered on ARM64!

    In this case, we'd like to reserve
    min(low_mmio_size/2, 128MB) = 128MB for the framebuffer
    mmio, since the max possible framebuffer so far is 128MB.

With the below command:
   Set-VM -LowMemoryMappedIoSpace 3GB \
          -VMName decui-u2204-gen2-fb
   // i.e. the default setting on Azure. Unlike x86-64, an ARM64
   // VM on Azure has 3GB of mmio below 4GB.
   Set-VMVideo -VMName decui-u2204-gen2-fb \
               -HorizontalResolution 4834 \
               -VerticalResolution 3622 \
               -ResolutionType Single
we have:
    max_fb_size = round_up_to_2MB(4834*3622*4) = 68 MB
    low_mmio_base = 4GB - 3GB = 1GB = 0x40000000
    low_mmio_size = 3GB
    fb_mmio_base = low_mmio_base = 1GB

    In this case, we'd like to reserve
    min(low_mmio_size/2, 128MB) = 128MB for the framebuffer
    mmio, since the max possible framebuffer so far is 128MB.

************************************

To recap, I think the bottom line is:

a) For Gen2 VMs, we can safely reserve a mmio range starting at
   sysfb_primary_display.screen.lfb_base with a size of
   min(low_mmio_size/2, 128MB).

   If sysfb_primary_display.screen.lfb_base is 0, i.e. in the case
   of kdump kernel, we use low_mmio_base instead.
   This should fix the mmio conflict in the kdump kernel.

b) For Gen1 VMs, let's still only reserve a mmio range starting at
   4GB - 128MB with a size of 64MB, because when we are in
   vmbus_reserve_fb(), we still don't know the exact size of the
   max_fb_size, and we don't want to reserve too much as we would
   want to reserve some low mmio space for PCI devices with 32-bit
   BARs (if any).

   If the user runs Set-VMVideo and needs a framebuffer size
   bigger than 64MB (IMO this is not a typical scenario in
   practice), we have to use high mmio for hyperv_drm in the first
   kernel, and the kdump kernel still suffers from the mmio
   conflict between hyperv_drm and hv_pci. We encourage Gen1 VM
   users to upgrade to Gen2 VMs to resolve the issue. Anyway, the
   mmio conflict is inevitable for Gen1 VMs, if the max required
   framebuffer size is bigger than 108MB (Note:
   128MB - VTPM_BASE_ADDRESS = 109.25, and the required framebuffer
   size is always rounded up to 2MB).

c) CVMs don't have the framebuffer device, so we don't need to reserve
    any mmio in vmbus_reserve_fb() for them.

Thanks for reading through this long email!

I'm making a patch right now...

Thanks,
Dexuan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
  2026-04-15 15:30       ` Dexuan Cui
@ 2026-04-15 16:46         ` Dexuan Cui
  2026-04-23 17:40         ` Michael Kelley
  1 sibling, 0 replies; 10+ messages in thread
From: Dexuan Cui @ 2026-04-15 16:46 UTC (permalink / raw)
  To: Dexuan Cui, Michael Kelley, KY Srinivasan, Haiyang Zhang,
	wei.liu@kernel.org, Long Li, lpieralisi@kernel.org,
	kwilczynski@kernel.org, mani@kernel.org, robh@kernel.org,
	bhelgaas@google.com, Jake Oshins, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
  Cc: Krister Johansen, stable@vger.kernel.org

> From: Dexuan Cui <DECUI@microsoft.com>
> Sent: Wednesday, April 15, 2026 8:31 AM
>  ...
> 4) For Gen1 VMs, the framebuffer always starts at the fixed
>    location 4G-128MB.
> 
>    4.1) By default, the low mmio range also starts at 4G-128MB,
>         and the size is 127.75 MB, i.e. if
>         hdev->channel->offermsg.offer.mmio_megabytes needs 128MB,
>         the guest hyperv_drm driver can't find enough available
>         mmio in the low mmio range, and has to use the high mmio
>         range.
> 
>    4.2) With "Set-VM -LowMemoryMappedIoSpace 1GB", the
>         low_mmio_base is 3GB, the low_mmio_size=1023.75 MB. The
>         fb_mmio_base is still 4G-128MB, i.e. if hyperv_drm needs
>         128 MB of mmio, it still has to use the high mmio range.

Well, this is inaccurate: in this case we could reserve 128MB low
mmio for hyperv_drm, but this is not really what we want: our
purpose is that we reserve the "initial" framebuffer mmio range so that
hyperv_drm in the first kernel doesn't have to relocate the framebuffer
mmio range. Even if we reserve 128MB low mmio for hyperv_drm
starting at 1GB:

a) hyperv_drm can be blacklisted by the users so from the host perspective,
it's still the "initial" framebuffer mmio range that takes affect, and we still
can have the mmio conflict in the kdump kernel.

b) hyperv_drm can load after hv_pci, so we can even have the mmio
conflict in the first kernel.

> On an ARM64 lab host, I also tested Gen2 VMs (there is no Gen1 VM
> for ARM VMs):
> 
> By default:
>   low_mmio_base = 4GB - 512MB, i.e. 0xe0000000
>   low_mmio_size = 512MB
>   fb_mmio_base = low_mmio_base
>   The default framebuffer size is 3MB
>   (i.e. screen.lfb_size = 3MB) but hyperv_drm:
>   mmio_megabytes = 8 MB, which supports up to 1920 * 1080.
> 
> With the below commands:
>    Set-VM -LowMemoryMappedIoSpace 512MB \
>           -VMName decui-u2204-gen2-fb
>    // the command only accepts a value between 512MB and 3.5GB.
>    Set-VMVideo -VMName decui-u2204-gen2-fb \
>                -HorizontalResolution 4834 \
>                -VerticalResolution 3622 \
>                -ResolutionType Single
> I thought we would have:
>     max_fb_size = round_up_to_2MB(4834*3622*4) = 68 MB
>     excess_fb_size = 4MB
>     low_mmio_base = 4GB - 512MB - 4MB * 2
>                   = 4GB - 520MB
>     fb_mmio_base = low_mmio_base
>     low_mmio_size = 4GB - low_mmio_base = 520MB
> 
>     Since 4GB - target_low_mmio_size = 4GB - 512MB, which is
>     smaller than low_mmio_base, so low_mmio_base and

Sorry for the typo: here the "smaller" should be "bigger".

>     fb_mmio_base would be both set to 4GB - 520MB, and
>     low_mmio_size would be 520MB.
> 
>     However, the actual result is:
>     max_fb_size is indeed 68MB.
>     but fb_mmio_base = low_mmio_base = 4GB - 512MB, and
>     low_mmio_size = 512MB, i.e. the 'excess_fb_size' is not
>     considered on ARM64!

I think this makes senses since " low_mmio_size = 512MB" is
already bigger enough for the framebuffer.

>     In this case, we'd like to reserve
>     min(low_mmio_size/2, 128MB) = 128MB for the framebuffer
>     mmio, since the max possible framebuffer so far is 128MB.
> 
 
Thanks,
Dexuan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
  2026-04-08  9:24   ` Dexuan Cui
  2026-04-08 13:53     ` Michael Kelley
@ 2026-04-16 18:49     ` Dexuan Cui
  1 sibling, 0 replies; 10+ messages in thread
From: Dexuan Cui @ 2026-04-16 18:49 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
	Long Li, lpieralisi@kernel.org, kwilczynski@kernel.org,
	mani@kernel.org, robh@kernel.org, bhelgaas@google.com,
	Jake Oshins, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
  Cc: Krister Johansen, stable@vger.kernel.org

> > ...
> > This approach could be taken one step further, where vmbus_reserve_fb()
> > *always* reserves 64 MiB starting at the low end of low MMIO space,
> > regardless of the value of "start". The messy code for getting "start"
> > could be dropped entirely, and the dependency on CONFIG_SYSFB goes
> > away. Or maybe still get the value of "start" and "size", and if non-zero
> > just do a sanity check that they are within the fixed 64 MiB reserved area.

My earlier reply yesterday explains why we shouldn't get rid of the
screen.lfb_base. I'm trying to make as few assumptions as possible.

> > Thoughts? To me tweaking vmbus_reserve_fb() is a more
> > straightforward and explicit way to do the reserving, vs. modifying
> > the requested range in the Hyper-V PCI driver.
> 
> Agreed. Let me try to make a new patch for review.

I just posted a patch here:
https://lwn.net/ml/linux-kernel/20260416183529.838321-1-decui%40microsoft.com/
Please review.

The new patch changes the vmbus driver. With it, the previous v2 pci-hyperv patch
 is unnecessary now.

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
  2026-04-15 15:30       ` Dexuan Cui
  2026-04-15 16:46         ` Dexuan Cui
@ 2026-04-23 17:40         ` Michael Kelley
  2026-04-29  1:58           ` Dexuan Cui
  1 sibling, 1 reply; 10+ messages in thread
From: Michael Kelley @ 2026-04-23 17:40 UTC (permalink / raw)
  To: Dexuan Cui, Michael Kelley, KY Srinivasan, Haiyang Zhang,
	wei.liu@kernel.org, Long Li, lpieralisi@kernel.org,
	kwilczynski@kernel.org, mani@kernel.org, robh@kernel.org,
	bhelgaas@google.com, Jake Oshins, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
  Cc: Krister Johansen, stable@vger.kernel.org

From: Dexuan Cui <DECUI@microsoft.com> Sent: Wednesday, April 15, 2026 8:31 AM
> 
> > From: Michael Kelley <mhklinux@outlook.com> Sent: Wednesday, April 8, 2026 6:54 AM

[snip]

> 
> Another example is: for a Gen2 VM with the below commands:
>    Set-VM -LowMemoryMappedIoSpace 1GB \
>           -VMName decui-u2204-gen2-fb
>    // i.e. the default setting on Azure. Let's ignore CVMs here.

FWIW, I'm seeing that in Gen2 VMs in Azure, the low_mmio_size
is 3 GiB. I'm looking at a D16ds_v5, and a D16lds_v6. The v5 VM
is newly created, while the v6 has been around for a few months.
In a CVM, the low_mmio_size should be 1 GiB. This overall example
is still correct -- it's just the comment that I have doubts about. Or
maybe you are looking at a different VM size that has a different
default?

Some years back, I had gotten into a discussion with Azure about
this size because the swiotlb memory wants to be allocated below
the 4 GiB line, and reserving 3 GiB for low mmio limited the size
of the swiotlb. CVMs were changed to have only 1 GiB for low
mmio because they need a larger swiotlb.


>    Set-VMVideo -VMName decui-u2204-gen2-fb \
>                -HorizontalResolution 4834 \
>                -VerticalResolution 3622 \
>                -ResolutionType Single
> we have:
>     max_fb_size = round_up_to_2MB(4834*3622*4) = 68 MB
>     excess_fb_size = 4MB
>     low_mmio_base = 4GB - 128MB - 4MB * 2
>                   = 4GB - 136 MB = 0xf7800000
>     but 4GB - target_low_mmio_size = 4GB - 1GB, which is
>     smaller than low_mmio_base, so low_mmio_base and
>     fb_mmio_base are both set to 4GB - 1GB = 0xc0000000,
>     and low_mmio_size = 1GB.
>     In this case, we'd like to reserve
>     min(low_mmio_size/2, 128MB) = 128MB for the framebuffer
>     mmio, since the max possible framebuffer so far is 128MB.
> 
> ************************************
> 
> On an ARM64 lab host, I also tested Gen2 VMs (there is no Gen1 VM
> for ARM VMs):
> 
> By default:
>   low_mmio_base = 4GB - 512MB, i.e. 0xe0000000
>   low_mmio_size = 512MB
>   fb_mmio_base = low_mmio_base
>   The default framebuffer size is 3MB
>   (i.e. screen.lfb_size = 3MB) but hyperv_drm:
>   mmio_megabytes = 8 MB, which supports up to 1920 * 1080.
> 
> With the below commands:
>    Set-VM -LowMemoryMappedIoSpace 512MB \
>           -VMName decui-u2204-gen2-fb
>    // the command only accepts a value between 512MB and 3.5GB.
>    Set-VMVideo -VMName decui-u2204-gen2-fb \
>                -HorizontalResolution 4834 \
>                -VerticalResolution 3622 \
>                -ResolutionType Single
> I thought we would have:
>     max_fb_size = round_up_to_2MB(4834*3622*4) = 68 MB
>     excess_fb_size = 4MB
>     low_mmio_base = 4GB - 512MB - 4MB * 2
>                   = 4GB - 520MB
>     fb_mmio_base = low_mmio_base
>     low_mmio_size = 4GB - low_mmio_base = 520MB
> 
>     Since 4GB - target_low_mmio_size = 4GB - 512MB, which is
>     smaller than low_mmio_base, so low_mmio_base and
>     fb_mmio_base would be both set to 4GB - 520MB, and
>     low_mmio_size would be 520MB.
> 
>     However, the actual result is:
>     max_fb_size is indeed 68MB.
>     but fb_mmio_base = low_mmio_base = 4GB - 512MB, and
>     low_mmio_size = 512MB, i.e. the 'excess_fb_size' is not
>     considered on ARM64!
> 
>     In this case, we'd like to reserve
>     min(low_mmio_size/2, 128MB) = 128MB for the framebuffer
>     mmio, since the max possible framebuffer so far is 128MB.
> 
> With the below command:
>    Set-VM -LowMemoryMappedIoSpace 3GB \
>           -VMName decui-u2204-gen2-fb
>    // i.e. the default setting on Azure. Unlike x86-64, an ARM64
>    // VM on Azure has 3GB of mmio below 4GB.

See my previous comment on the same topic. I think arm64
and x86/x64 are the same.

>    Set-VMVideo -VMName decui-u2204-gen2-fb \
>                -HorizontalResolution 4834 \
>                -VerticalResolution 3622 \
>                -ResolutionType Single
> we have:
>     max_fb_size = round_up_to_2MB(4834*3622*4) = 68 MB
>     low_mmio_base = 4GB - 3GB = 1GB = 0x40000000
>     low_mmio_size = 3GB
>     fb_mmio_base = low_mmio_base = 1GB
> 
>     In this case, we'd like to reserve
>     min(low_mmio_size/2, 128MB) = 128MB for the framebuffer
>     mmio, since the max possible framebuffer so far is 128MB.
> 
> ************************************
> 
> To recap, I think the bottom line is:
> 
> a) For Gen2 VMs, we can safely reserve a mmio range starting at
>    sysfb_primary_display.screen.lfb_base with a size of
>    min(low_mmio_size/2, 128MB).
> 
>    If sysfb_primary_display.screen.lfb_base is 0, i.e. in the case
>    of kdump kernel, we use low_mmio_base instead.
>    This should fix the mmio conflict in the kdump kernel.
> 
> b) For Gen1 VMs, let's still only reserve a mmio range starting at
>    4GB - 128MB with a size of 64MB, because when we are in
>    vmbus_reserve_fb(), we still don't know the exact size of the
>    max_fb_size, and we don't want to reserve too much as we would
>    want to reserve some low mmio space for PCI devices with 32-bit
>    BARs (if any).
> 
>    If the user runs Set-VMVideo and needs a framebuffer size
>    bigger than 64MB (IMO this is not a typical scenario in
>    practice), we have to use high mmio for hyperv_drm in the first
>    kernel, and the kdump kernel still suffers from the mmio
>    conflict between hyperv_drm and hv_pci. We encourage Gen1 VM
>    users to upgrade to Gen2 VMs to resolve the issue. Anyway, the
>    mmio conflict is inevitable for Gen1 VMs, if the max required
>    framebuffer size is bigger than 108MB (Note:
>    128MB - VTPM_BASE_ADDRESS = 109.25, and the required framebuffer
>    size is always rounded up to 2MB).

Question about Gen 1 VMs: If the Linux frame buffer driver moves
the frame buffer somewhere other than the default location, and
then the VM does a kexec/kdump, what does the legacy PCI graphic
device BAR report as the frame buffer location? Does it *always*
report 4G-128MB, or does it report the new location? I can run
an experiment to find out, but maybe you've already done so and
not reported that detail here.

Michael

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
  2026-04-23 17:40         ` Michael Kelley
@ 2026-04-29  1:58           ` Dexuan Cui
  2026-04-29 18:01             ` Michael Kelley
  0 siblings, 1 reply; 10+ messages in thread
From: Dexuan Cui @ 2026-04-29  1:58 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
	Long Li, lpieralisi@kernel.org, kwilczynski@kernel.org,
	mani@kernel.org, robh@kernel.org, bhelgaas@google.com,
	Jake Oshins, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
  Cc: Krister Johansen, stable@vger.kernel.org

> From: Michael Kelley <mhklinux@outlook.com>
> Sent: Thursday, April 23, 2026 10:40 AM
> > ...
> > Another example is: for a Gen2 VM with the below commands:
> >    Set-VM -LowMemoryMappedIoSpace 1GB \
> >           -VMName decui-u2204-gen2-fb
> >    // i.e. the default setting on Azure. Let's ignore CVMs here.

Sorry for the incorrect statement: this is not the default setting
on Azure. The default for regular VMs on Azure should be
"-LowMemoryMappedIoSpace 3GB".  Not sure how I made the
incorrect statement -- I guess I might have confused my local VM
with my Azure VM, and at some moment, I might have mistaken
the meaning of the "-LowMemoryMappedIoSpace" parameter:
for that local VM, I might somehow incorrectly though that the
param means low_mmio_base rather than low_mmio_size.

> FWIW, I'm seeing that in Gen2 VMs in Azure, the low_mmio_size
> is 3 GiB. I'm looking at a D16ds_v5, and a D16lds_v6. The v5 VM
> is newly created, while the v6 has been around for a few months.

This is also my observation, after I double checked my Azure VM.

> In a CVM, the low_mmio_size should be 1 GiB. This overall example
> is still correct -- it's just the comment that I have doubts about. Or
> maybe you are looking at a different VM size that has a different
> default?

For CVMs, yes, the low_mmio_size is 1GB.

> 
> Some years back, I had gotten into a discussion with Azure about
> this size because the swiotlb memory wants to be allocated below
> the 4 GiB line, and reserving 3 GiB for low mmio limited the size
> of the swiotlb. CVMs were changed to have only 1 GiB for low
> mmio because they need a larger swiotlb.

Right, I also remember the story. :-)

> > With the below command:
> >    Set-VM -LowMemoryMappedIoSpace 3GB \
> >           -VMName decui-u2204-gen2-fb
> >    // i.e. the default setting on Azure. Unlike x86-64, an ARM64
> >    // VM on Azure has 3GB of mmio below 4GB.
> 
> See my previous comment on the same topic. I think arm64
> and x86/x64 are the same.

Agreed.

> Question about Gen 1 VMs: If the Linux frame buffer driver moves
> the frame buffer somewhere other than the default location, and
> then the VM does a kexec/kdump, what does the legacy PCI graphic
> device BAR report as the frame buffer location? Does it *always*
> report 4G-128MB, or does it report the new location? I can run

It always reports 4G-128MB. 
BTW,  I suspect a Gen2 VM may have the same issue, i.e. 
currently we only reserve 8MB below 4GB; if hyperv_drm uses
high MMIO, I suspect the UEFI firmware would still report the
same original low MMIO framebuffer base/size to the kdump kernel,
but there is no easy way to verify this for Gen2 VMs...

> an experiment to find out, but maybe you've already done so and
> not reported that detail here.
> 
> Michael

I have a Gen1 Ubuntu 22.04 VM, and I run the below commands:
Set-VM -LowMemoryMappedIoSpace 128MB -VMName decui-u2204-gen1-fb
Set-VMVideo -VMName decui-u2204-gen1-fb -HorizontalResolution 7680 -VerticalResolution 4320 -ResolutionType Single

When the VM boots up, we reserve 64MB at 4G-128MB:
[   11.492075] hv_vmbus: hv_mmio=[mem 0xf8000000-0xfed3ffff],[mem 0xfe0000000-0xfffffffff] fb=[mem 0xf8000000-0xfbffffff]

Since the required mmio size in the hyperv-drm driver is 128MB:
[   28.631923] hyperv_connect_vsp: hyperv_drm: mmio_megabytes=128 MB
the driver has to allocate MMIO from the high MMIO space, because 
we only reserve 64MB below 4GB, and the available low_mmio_size is
smaller than 128MB due to the vTPM MMIO range:

# cat /proc/iomem
00000000-00000fff : Reserved
00001000-0009fbff : System RAM
0009fc00-0009ffff : Reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000c7fff : Video ROM
000e0000-000fffff : Reserved
  000f0000-000fffff : System ROM
00100000-f7feffff : System RAM
  d7000000-f6ffffff : Crash kernel
f7ff0000-f7ffefff : ACPI Tables
f7fff000-f7ffffff : ACPI Non-volatile Storage
f8000000-fffbffff : PCI Bus 0000:00
  f8000000-fbffffff : 0000:00:08.0
  fec00000-fec003ff : IOAPIC 0
  fee00000-fee00fff : PNP0C02:01
fffc0000-ffffffff : PNP0C01:00
100000000-507ffffff : System RAM
  281600000-28295449f : Kernel code
  282a00000-283746fff : Kernel rodata
  283800000-283c5287f : Kernel data
  28411a000-2845fffff : Kernel bss
fe0000000-fffffffff : PCI Bus 0000:00
  fe0000000-fe7ffffff : 5620e0c7-8062-4dce-aeb7-520c7ef76171

However,  when the kdump kernel starts to run, and I print the
pci_resource_start(pdev, 0) and pci_resource_len(pdev, 0)
from vmbus_reserve_fb(), I still see 4G-128MB:
[   12.506159] Gen1 VM: start=0xf8000000, size=0x4000000

In this case, we can't really fix the MMIO conflict, e.g.
if both hv_pci and hyperv_drm are built as modules, then
the order of loading them can be nondeterministic:if the order
in the first kernel is different from the order in
the kdump kernel, we run into trouble.

If the order is deterministic (e.g. hv_pci is
built-in, and hyperv_drm is built as a module),
we should be good since both allocates MMIO from
the high MMIO range in a deterministic way.

Thanks,
Dexuan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
  2026-04-29  1:58           ` Dexuan Cui
@ 2026-04-29 18:01             ` Michael Kelley
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Kelley @ 2026-04-29 18:01 UTC (permalink / raw)
  To: Dexuan Cui, Michael Kelley, KY Srinivasan, Haiyang Zhang,
	wei.liu@kernel.org, Long Li, lpieralisi@kernel.org,
	kwilczynski@kernel.org, mani@kernel.org, robh@kernel.org,
	bhelgaas@google.com, Jake Oshins, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
  Cc: Krister Johansen, stable@vger.kernel.org

From: Dexuan Cui <DECUI@microsoft.com> Sent: Tuesday, April 28, 2026 6:58 PM
> > From: Michael Kelley <mhklinux@outlook.com> Sent: Thursday, April 23, 2026 10:40 AM

[snip]

> 
> > Question about Gen 1 VMs: If the Linux frame buffer driver moves
> > the frame buffer somewhere other than the default location, and
> > then the VM does a kexec/kdump, what does the legacy PCI graphic
> > device BAR report as the frame buffer location? Does it *always*
> > report 4G-128MB, or does it report the new location? I can run
> 
> It always reports 4G-128MB.

OK, good to know. I was hoping it might report the new location. :-(

> BTW,  I suspect a Gen2 VM may have the same issue, i.e.
> currently we only reserve 8MB below 4GB; if hyperv_drm uses
> high MMIO, I suspect the UEFI firmware would still report the
> same original low MMIO framebuffer base/size to the kdump kernel,
> but there is no easy way to verify this for Gen2 VMs...
> 

[snip]

> 
> However,  when the kdump kernel starts to run, and I print the
> pci_resource_start(pdev, 0) and pci_resource_len(pdev, 0)
> from vmbus_reserve_fb(), I still see 4G-128MB:
> [   12.506159] Gen1 VM: start=0xf8000000, size=0x4000000
> 
> In this case, we can't really fix the MMIO conflict, e.g.
> if both hv_pci and hyperv_drm are built as modules, then
> the order of loading them can be nondeterministic:if the order
> in the first kernel is different from the order in
> the kdump kernel, we run into trouble.

Yep.

> 
> If the order is deterministic (e.g. hv_pci is
> built-in, and hyperv_drm is built as a module),
> we should be good since both allocates MMIO from
> the high MMIO range in a deterministic way.
> 

Yep.

Thanks,

Michael

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-04-29 18:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-02 23:43 [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window Dexuan Cui
2026-04-05 23:15 ` Michael Kelley
2026-04-08  9:24   ` Dexuan Cui
2026-04-08 13:53     ` Michael Kelley
2026-04-15 15:30       ` Dexuan Cui
2026-04-15 16:46         ` Dexuan Cui
2026-04-23 17:40         ` Michael Kelley
2026-04-29  1:58           ` Dexuan Cui
2026-04-29 18:01             ` Michael Kelley
2026-04-16 18:49     ` Dexuan Cui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.