[PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window

public inbox for linux-hyperv@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
@ 2026-04-02 23:43 Dexuan Cui
  2026-04-05 23:15 ` Michael Kelley
  0 siblings, 1 reply; 4+ messages in thread
From: Dexuan Cui @ 2026-04-02 23:43 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, lpieralisi, kwilczynski,
	mani, robh, bhelgaas, jakeo, linux-hyperv, linux-pci,
	linux-kernel, mhklinux, matthew.ruffell, kjlx
  Cc: Krister Johansen, stable

There has been a longstanding MMIO conflict between the pci_hyperv
driver's config_window (see hv_allocate_config_window()) and the
hyperv_drm (or hyperv_fb) driver (see hyperv_setup_vram()): typically
both get MMIO from the low MMIO range below 4GB; this is not an issue
in the normal kernel since the VMBus driver reserves the framebuffer
MMIO range in vmbus_reserve_fb(), so the drm driver's hyperv_setup_vram()
can always get the reserved framebuffer MMIO; however, a Gen2 VM's
kdump kernel can fail to reserve the framebuffer MMIO in
vmbus_reserve_fb() because the screen_info.lfb_base is zero in the
kdump kernel due to several possible reasons (see the Link below for
more details):

1) on ARM64, the two syscalls (KEXEC_LOAD, KEXEC_FILE_LOAD) don't
initialize the screen_info.lfb_base for the kdump kernel;

2) on x86-64, the KEXEC_FILE_LOAD syscall initializes kdump kernel's
screen_info.lfb_base, but the KEXEC_LOAD syscall doesn't really do that
when the hyperv_drm driver loads, because the user-space kexec-tools
(i.e. the program 'kexec') doesn't recognize the hyperv_drm driver
(let's ignore the behavior of kexec-tools of very old versions).

When vmbus_reserve_fb() fails to reserve the framebuffer MMIO in the
kdump kernel, if pci_hyperv in the kdump kernel loads before hyperv_drm
loads, pci_hyperv's vmbus_allocate_mmio() gets the framebuffer MMIO
and tries to use it, but since the host thinks that the MMIO range is
still in use by hyperv_drm, the host refuses to accept the MMIO range
as the config window, and pci_hyperv's hv_pci_enter_d0() errors out,
e.g. an error can be "PCI Pass-through VSP failed D0 Entry with status
c0370048".

Typically, this pci_hyperv error in the kdump kernel was not fatal in
the past because the kdump kernel typically doesn't rely on pci_hyperv,
i.e. the root file system is on a VMBus SCSI device.

Now, a VM on Azure can boot from NVMe, i.e. the root file system can be
on a NVMe device, which depends on pci_hyperv. When the error occurs,
the kdump kernel fails to boot up since no root file system is detected.

Fix the MMIO conflict by allocating MMIO above 4GB for the config_window,
so it won't conflict with hyperv_drm's MMIO, which should be below the
4GB boundary. The size of config_window is small: it's only 8KB per PCI
device, so there should be sufficient MMIO space available above 4GB.

Note: we still need to figure out how to address the possible MMIO
conflict between hyperv_drm and pci_hyperv in the case of 32-bit PCI
MMIO BARs, but that's of low priority because all PCI devices available
to a Linux VM on Azure or on a modern host should use 64-bit BARs and
should not use 32-bit BARs -- I checked Mellanox VFs, MANA VFs, NVMe
devices, and GPUs in Linux VMs on Azure, and found no 32-bit BARs.

Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Link: https://lore.kernel.org/all/SA1PR21MB692176C1BC53BFC9EAE5CF8EBF51A@SA1PR21MB6921.namprd21.prod.outlook.com/
Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com>
Tested-by: Krister Johansen <johansen@templeofstupid.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: stable@vger.kernel.org
---

Changes since v1:
    Updated the commit message and the comment to better explain
    why screen_info.lfb_base can be 0 in the kdump kernel.

    No code change since v1.

 drivers/pci/controller/pci-hyperv.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 2c7a406b4ba8..1a79334ea9f4 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3403,9 +3403,26 @@ static int hv_allocate_config_window(struct hv_pcibus_device *hbus)

 	/*
 	 * Set up a region of MMIO space to use for accessing configuration
-	 * space.
+	 * space. Use the high MMIO range to not conflict with the hyperv_drm
+	 * driver (which normally gets MMIO from the low MMIO range) in the
+	 * kdump kernel of a Gen2 VM, which may fail to reserve the framebuffer
+	 * MMIO range in vmbus_reserve_fb() due to screen_info.lfb_base being
+	 * zero in the kdump kernel:
+	 *
+	 * on ARM64, the two syscalls (KEXEC_LOAD, KEXEC_FILE_LOAD) don't
+	 * initialize the screen_info.lfb_base for the kdump kernel;
+	 *
+	 * on x86-64, the KEXEC_FILE_LOAD syscall initializes kdump kernel's
+	 * screen_info.lfb_base (see bzImage64_load() -> setup_boot_parameters())
+	 * but the KEXEC_LOAD syscall doesn't really do that when the hyperv_drm
+	 * driver loads, because the user-space program 'kexec' doesn't
+	 * recognize hyperv_drm: see the function setup_linux_vesafb() in the
+	 * kexec-tools.git repo. Note: old versions of kexec-tools, e.g.
+	 * v2.0.18, initialize screen_info.lfb_base if the hyperv_fb driver
+	 * loads, but hyperv_fb is deprecated and has been removed from the
+	 * mainline kernel.
 	 */
-	ret = vmbus_allocate_mmio(&hbus->mem_config, hbus->hdev, 0, -1,
+	ret = vmbus_allocate_mmio(&hbus->mem_config, hbus->hdev, SZ_4G, -1,
 				  PCI_CONFIG_MMIO_LENGTH, 0x1000, false);
 	if (ret)
 		return ret;
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
  2026-04-02 23:43 [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window Dexuan Cui
@ 2026-04-05 23:15 ` Michael Kelley
  2026-04-08  9:24   ` Dexuan Cui
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Kelley @ 2026-04-05 23:15 UTC (permalink / raw)
  To: Dexuan Cui, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, longli@microsoft.com, lpieralisi@kernel.org,
	kwilczynski@kernel.org, mani@kernel.org, robh@kernel.org,
	bhelgaas@google.com, jakeo@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
  Cc: Krister Johansen, stable@vger.kernel.org

From: Dexuan Cui <decui@microsoft.com> Sent: Thursday, April 2, 2026 4:43 PM
> 
> There has been a longstanding MMIO conflict between the pci_hyperv
> driver's config_window (see hv_allocate_config_window()) and the
> hyperv_drm (or hyperv_fb) driver (see hyperv_setup_vram()): typically
> both get MMIO from the low MMIO range below 4GB; this is not an issue
> in the normal kernel since the VMBus driver reserves the framebuffer
> MMIO range in vmbus_reserve_fb(), so the drm driver's hyperv_setup_vram()
> can always get the reserved framebuffer MMIO; however, a Gen2 VM's
> kdump kernel can fail to reserve the framebuffer MMIO in
> vmbus_reserve_fb() because the screen_info.lfb_base is zero in the
> kdump kernel due to several possible reasons (see the Link below for
> more details):
> 
> 1) on ARM64, the two syscalls (KEXEC_LOAD, KEXEC_FILE_LOAD) don't
> initialize the screen_info.lfb_base for the kdump kernel;
> 
> 2) on x86-64, the KEXEC_FILE_LOAD syscall initializes kdump kernel's
> screen_info.lfb_base, but the KEXEC_LOAD syscall doesn't really do that
> when the hyperv_drm driver loads, because the user-space kexec-tools
> (i.e. the program 'kexec') doesn't recognize the hyperv_drm driver
> (let's ignore the behavior of kexec-tools of very old versions).
> 
> When vmbus_reserve_fb() fails to reserve the framebuffer MMIO in the
> kdump kernel, if pci_hyperv in the kdump kernel loads before hyperv_drm
> loads, pci_hyperv's vmbus_allocate_mmio() gets the framebuffer MMIO
> and tries to use it, but since the host thinks that the MMIO range is
> still in use by hyperv_drm, the host refuses to accept the MMIO range
> as the config window, and pci_hyperv's hv_pci_enter_d0() errors out,
> e.g. an error can be "PCI Pass-through VSP failed D0 Entry with status
> c0370048".
> 
> Typically, this pci_hyperv error in the kdump kernel was not fatal in
> the past because the kdump kernel typically doesn't rely on pci_hyperv,
> i.e. the root file system is on a VMBus SCSI device.
> 
> Now, a VM on Azure can boot from NVMe, i.e. the root file system can be
> on a NVMe device, which depends on pci_hyperv. When the error occurs,
> the kdump kernel fails to boot up since no root file system is detected.
> 
> Fix the MMIO conflict by allocating MMIO above 4GB for the config_window,
> so it won't conflict with hyperv_drm's MMIO, which should be below the
> 4GB boundary. The size of config_window is small: it's only 8KB per PCI
> device, so there should be sufficient MMIO space available above 4GB.
> 
> Note: we still need to figure out how to address the possible MMIO
> conflict between hyperv_drm and pci_hyperv in the case of 32-bit PCI
> MMIO BARs, but that's of low priority because all PCI devices available
> to a Linux VM on Azure or on a modern host should use 64-bit BARs and
> should not use 32-bit BARs -- I checked Mellanox VFs, MANA VFs, NVMe
> devices, and GPUs in Linux VMs on Azure, and found no 32-bit BARs.

Just to clarify, since this patch is predicated on all BARs being 64-bit,
hv_pci_alloc_bridge_windows() never encounters a non-zero
hbus->low_mmio_space, and hence also never allocates from low
MMIO space. So hv_pci_alloc_bridge_windows() does not need to be
patched. Is that correct?

Taking a broader view, fundamentally the current MMIO location of
the frame buffer may be unknown to the Linux guest. At the same time,
Linux must ensure that PCI devices don't get assigned to the MMIO space
where the frame buffer is located. While the current MMIO location of
the frame buffer may be unknown, we can assume it was placed in low
MMIO space by the host -- either Windows Hyper-V or Linux/VMM
in the root partition, and perhaps as mediated by a paravisor. Probably
need to confirm with the Linux-in-the-root partition team (and maybe
the OpenHCL team) that this assumption is true. Presumably the
hyperv_drm driver doesn't need to move the frame buffer, but if it
does, it must stay in the low MMIO space.

This patch depends on this assumption, and effectively reserves
the entire low MMIO space for the frame buffer. The low MMIO space
size defaults to 128 MiB on a local Hyper-V, and is set to 3 GiB in most
Azure VMs (or to 1 GiB in an Azure CVM), so that all gets reserved.

A slightly different approach to the whole problem is to change
vmbus_reserve_fb(). If it is unable to get a non-zero "start" value, then
it should use the same assumption as above, and reserve a frame buffer
area starting at the lowest address in low MMIO space. The reserved size
could be the max possible frame buffer size, which I think is 64 MiB (?).
This still leaves low MMIO space for subsequent PCI devices, and allows
32-bit BARs to continue to work. This approach requires one further
assumption, which is that the host, plus any movement by hyperv_drm,
has kept the frame buffer at the low end of the low MMIO space. From
what I've seen, that assumption is reality -- the frame buffer always
starts at the beginning of low MMIO space.

This approach could be taken one step further, where vmbus_reserve_fb()
*always* reserves 64 MiB starting at the low end of low MMIO space,
regardless of the value of "start". The messy code for getting "start"
could be dropped entirely, and the dependency on CONFIG_SYSFB goes
away. Or maybe still get the value of "start" and "size", and if non-zero
just do a sanity check that they are within the fixed 64 MiB reserved area.

Thoughts? To me tweaking vmbus_reserve_fb() is a more
straightforward and explicit way to do the reserving, vs. modifying
the requested range in the Hyper-V PCI driver. And FWIW, it avoids
introducing the 32-bit BAR limitation.

Michael

> 
> Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
> Link: https://lore.kernel.org/all/SA1PR21MB692176C1BC53BFC9EAE5CF8EBF51A@SA1PR21MB6921.namprd21.prod.outlook.com/
> Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com>
> Tested-by: Krister Johansen <johansen@templeofstupid.com>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Cc: stable@vger.kernel.org
> ---
> 
> Changes since v1:
>     Updated the commit message and the comment to better explain
>     why screen_info.lfb_base can be 0 in the kdump kernel.
> 
>     No code change since v1.
> 
> 
>  drivers/pci/controller/pci-hyperv.c | 21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 2c7a406b4ba8..1a79334ea9f4 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -3403,9 +3403,26 @@ static int hv_allocate_config_window(struct
> hv_pcibus_device *hbus)
> 
>  	/*
>  	 * Set up a region of MMIO space to use for accessing configuration
> -	 * space.
> +	 * space. Use the high MMIO range to not conflict with the hyperv_drm
> +	 * driver (which normally gets MMIO from the low MMIO range) in the
> +	 * kdump kernel of a Gen2 VM, which may fail to reserve the framebuffer
> +	 * MMIO range in vmbus_reserve_fb() due to screen_info.lfb_base being
> +	 * zero in the kdump kernel:
> +	 *
> +	 * on ARM64, the two syscalls (KEXEC_LOAD, KEXEC_FILE_LOAD) don't
> +	 * initialize the screen_info.lfb_base for the kdump kernel;
> +	 *
> +	 * on x86-64, the KEXEC_FILE_LOAD syscall initializes kdump kernel's
> +	 * screen_info.lfb_base (see bzImage64_load() -> setup_boot_parameters())
> +	 * but the KEXEC_LOAD syscall doesn't really do that when the hyperv_drm
> +	 * driver loads, because the user-space program 'kexec' doesn't
> +	 * recognize hyperv_drm: see the function setup_linux_vesafb() in the
> +	 * kexec-tools.git repo. Note: old versions of kexec-tools, e.g.
> +	 * v2.0.18, initialize screen_info.lfb_base if the hyperv_fb driver
> +	 * loads, but hyperv_fb is deprecated and has been removed from the
> +	 * mainline kernel.
>  	 */
> -	ret = vmbus_allocate_mmio(&hbus->mem_config, hbus->hdev, 0, -1,
> +	ret = vmbus_allocate_mmio(&hbus->mem_config, hbus->hdev, SZ_4G, -1,
>  				  PCI_CONFIG_MMIO_LENGTH, 0x1000, false);
>  	if (ret)
>  		return ret;
> --
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
  2026-04-05 23:15 ` Michael Kelley
@ 2026-04-08  9:24   ` Dexuan Cui
  2026-04-08 13:53     ` Michael Kelley
  0 siblings, 1 reply; 4+ messages in thread
From: Dexuan Cui @ 2026-04-08  9:24 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
	Long Li, lpieralisi@kernel.org, kwilczynski@kernel.org,
	mani@kernel.org, robh@kernel.org, bhelgaas@google.com,
	Jake Oshins, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
  Cc: Krister Johansen, stable@vger.kernel.org

> From: Michael Kelley <mhklinux@outlook.com>
> Sent: Sunday, April 5, 2026 4:15 PM
> > ...
> > Note: we still need to figure out how to address the possible MMIO
> > conflict between hyperv_drm and pci_hyperv in the case of 32-bit PCI
> > MMIO BARs, but that's of low priority because all PCI devices available
> > to a Linux VM on Azure or on a modern host should use 64-bit BARs and
> > should not use 32-bit BARs -- I checked Mellanox VFs, MANA VFs, NVMe
> > devices, and GPUs in Linux VMs on Azure, and found no 32-bit BARs.
> 
> Just to clarify, since this patch is predicated on all BARs being 64-bit,
> hv_pci_alloc_bridge_windows() never encounters a non-zero
> hbus->low_mmio_space, and hence also never allocates from low
> MMIO space. So hv_pci_alloc_bridge_windows() does not need to be
> patched. Is that correct?

Correct. For 32-bit BARs (if any), IMO we can't really do anything for
them in hv_pci_allocate_bridge_windows(), since they must reside
below 4GB.

Note: while the patch doesn't fix the MMIO conflict if there are any
32-bit BARs, the patch doesn't make things worse for 32-bit BARs (if any).

> Taking a broader view, fundamentally the current MMIO location of
> the frame buffer may be unknown to the Linux guest. At the same time,
> Linux must ensure that PCI devices don't get assigned to the MMIO space
> where the frame buffer is located. While the current MMIO location of
> the frame buffer may be unknown, we can assume it was placed in low
> MMIO space by the host -- either Windows Hyper-V or Linux/VMM
> in the root partition, and perhaps as mediated by a paravisor. Probably
> need to confirm with the Linux-in-the-root partition team (and maybe
> the OpenHCL team) that this assumption is true. 

IMO this is a good idea! It looks like the framebuffer base always starts
at the beginning of the low MMIO space. We can reserve some
MMIO for the framebuffer at the beginning of the low MMIO space.

> Presumably the
> hyperv_drm driver doesn't need to move the frame buffer, but if it
> does, it must stay in the low MMIO space.

It looks like this assumption is true.

> This patch depends on this assumption, and effectively reserves
> the entire low MMIO space for the frame buffer. 

To make it precise, the patch reserves the entire low MMIO space for
the frame buffer and the 32-bit BARs (if any), and there is no MMIO
conflict in the first kernel (assuming hyperv_drm doesn't relocate the
MMIO range), and there can be an MMIO conflict in the
kdump/kexec kernel if there is any 32-bit BAR.

> The low MMIO space
> size defaults to 128 MiB on a local Hyper-V, 
Yes, by default, the low MMIO base =0xf800_0000, size=128MB, 
but the range [0xfed4_0000, 0xffff_ffff], whose size is 18.75MB,
is reserved for vTPM: see vmbus_walk_resources(). So by default
the available low MMIO size for hyperv_drm is 128 - 18.75 = 
109.25 MB.

The size of the framebuffer should be aligned to 2MB, so if the
framebuffer size is bigger than 108MB, it looks like there is no
enough MMIO space in the low MMIO range, e.g. with the below
command:
Set-VMVideo -VMName vm_name -HorizontalResolution 7680
-VerticalResolution 4320 -ResolutionType Maximum
, the resulting max framebuffer size is 
7680 * 4320 * 32/8 /1024.0/1024 = 126.5625, which would be
rounded up to 128MB.

However, according to my testing, with the above command,
the low MMIO base = 0xf000_0000, size=256MB, so it's probably
ok to reserve 128 MB for the frame buffer. 

In case the low MMIO size is <=64MB, we would want to reserve
less MMIO for the frame buffer.

> and is set to 3 GiB in most
> Azure VMs (or to 1 GiB in an Azure CVM), so that all gets reserved.
> 
> A slightly different approach to the whole problem is to change
> vmbus_reserve_fb(). If it is unable to get a non-zero "start" value, then
> it should use the same assumption as above, and reserve a frame buffer
> area starting at the lowest address in low MMIO space. The reserved size
> could be the max possible frame buffer size, which I think is 64 MiB (?).

It can be 128MB with the highest resolution 7680*4320 (I hope the
highest resolution won't become bigger in the future).

> This still leaves low MMIO space for subsequent PCI devices, and allows
> 32-bit BARs to continue to work. This approach requires one further
> assumption, which is that the host, plus any movement by hyperv_drm,
> has kept the frame buffer at the low end of the low MMIO space. From
> what I've seen, that assumption is reality -- the frame buffer always
> starts at the beginning of low MMIO space.
> 
> This approach could be taken one step further, where vmbus_reserve_fb()
> *always* reserves 64 MiB starting at the low end of low MMIO space,
> regardless of the value of "start". The messy code for getting "start"
> could be dropped entirely, and the dependency on CONFIG_SYSFB goes
> away. Or maybe still get the value of "start" and "size", and if non-zero
> just do a sanity check that they are within the fixed 64 MiB reserved area.
> 
> Thoughts? To me tweaking vmbus_reserve_fb() is a more
> straightforward and explicit way to do the reserving, vs. modifying
> the requested range in the Hyper-V PCI driver. 

Agreed. Let me try to make a new patch for review.

> And FWIW, it avoids  introducing the 32-bit BAR limitation.

This patch addresses the MMIO conflict for 64-bit BARs and not for
32-bit BARs (if any). The patch does not introduce the 32-bit BAR limitation.

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
  2026-04-08  9:24   ` Dexuan Cui
@ 2026-04-08 13:53     ` Michael Kelley
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Kelley @ 2026-04-08 13:53 UTC (permalink / raw)
  To: Dexuan Cui, Michael Kelley, KY Srinivasan, Haiyang Zhang,
	wei.liu@kernel.org, Long Li, lpieralisi@kernel.org,
	kwilczynski@kernel.org, mani@kernel.org, robh@kernel.org,
	bhelgaas@google.com, Jake Oshins, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
  Cc: Krister Johansen, stable@vger.kernel.org

From: Dexuan Cui <DECUI@microsoft.com> Sent: Wednesday, April 8, 2026 2:24 AM
> 
> > From: Michael Kelley <mhklinux@outlook.com>
> > Sent: Sunday, April 5, 2026 4:15 PM
> > > ...
> > > Note: we still need to figure out how to address the possible MMIO
> > > conflict between hyperv_drm and pci_hyperv in the case of 32-bit PCI
> > > MMIO BARs, but that's of low priority because all PCI devices available
> > > to a Linux VM on Azure or on a modern host should use 64-bit BARs and
> > > should not use 32-bit BARs -- I checked Mellanox VFs, MANA VFs, NVMe
> > > devices, and GPUs in Linux VMs on Azure, and found no 32-bit BARs.
> >
> > Just to clarify, since this patch is predicated on all BARs being 64-bit,
> > hv_pci_alloc_bridge_windows() never encounters a non-zero
> > hbus->low_mmio_space, and hence also never allocates from low
> > MMIO space. So hv_pci_alloc_bridge_windows() does not need to be
> > patched. Is that correct?
> 
> Correct. For 32-bit BARs (if any), IMO we can't really do anything for
> them in hv_pci_allocate_bridge_windows(), since they must reside
> below 4GB.
> 
> Note: while the patch doesn't fix the MMIO conflict if there are any
> 32-bit BARs, the patch doesn't make things worse for 32-bit BARs (if any).

OK, right. Your patch doesn't prevent 32-bit BARs from working. It
just doesn't fix any potential frame buffer conflicts with 32-bit BARs.
I misinterpreted the situation.

> 
> > Taking a broader view, fundamentally the current MMIO location of
> > the frame buffer may be unknown to the Linux guest. At the same time,
> > Linux must ensure that PCI devices don't get assigned to the MMIO space
> > where the frame buffer is located. While the current MMIO location of
> > the frame buffer may be unknown, we can assume it was placed in low
> > MMIO space by the host -- either Windows Hyper-V or Linux/VMM
> > in the root partition, and perhaps as mediated by a paravisor. Probably
> > need to confirm with the Linux-in-the-root partition team (and maybe
> > the OpenHCL team) that this assumption is true.
> 
> IMO this is a good idea! It looks like the framebuffer base always starts
> at the beginning of the low MMIO space. We can reserve some
> MMIO for the framebuffer at the beginning of the low MMIO space.
> 
> > Presumably the
> > hyperv_drm driver doesn't need to move the frame buffer, but if it
> > does, it must stay in the low MMIO space.
> 
> It looks like this assumption is true.
> 
> > This patch depends on this assumption, and effectively reserves
> > the entire low MMIO space for the frame buffer.
> 
> To make it precise, the patch reserves the entire low MMIO space for
> the frame buffer and the 32-bit BARs (if any), and there is no MMIO
> conflict in the first kernel (assuming hyperv_drm doesn't relocate the
> MMIO range), and there can be an MMIO conflict in the
> kdump/kexec kernel if there is any 32-bit BAR.
> 
> > The low MMIO space
> > size defaults to 128 MiB on a local Hyper-V,
> Yes, by default, the low MMIO base =0xf800_0000, size=128MB,
> but the range [0xfed4_0000, 0xffff_ffff], whose size is 18.75MB,
> is reserved for vTPM: see vmbus_walk_resources(). So by default
> the available low MMIO size for hyperv_drm is 128 - 18.75 =
> 109.25 MB.
> 
> The size of the framebuffer should be aligned to 2MB, so if the
> framebuffer size is bigger than 108MB, it looks like there is no
> enough MMIO space in the low MMIO range, e.g. with the below
> command:
> Set-VMVideo -VMName vm_name -HorizontalResolution 7680
> -VerticalResolution 4320 -ResolutionType Maximum
> , the resulting max framebuffer size is
> 7680 * 4320 * 32/8 /1024.0/1024 = 126.5625, which would be
> rounded up to 128MB.
> 
> However, according to my testing, with the above command,
> the low MMIO base = 0xf000_0000, size=256MB, so it's probably
> ok to reserve 128 MB for the frame buffer.
> 
> In case the low MMIO size is <=64MB, we would want to reserve
> less MMIO for the frame buffer.
> 
> > and is set to 3 GiB in most
> > Azure VMs (or to 1 GiB in an Azure CVM), so that all gets reserved.
> >
> > A slightly different approach to the whole problem is to change
> > vmbus_reserve_fb(). If it is unable to get a non-zero "start" value, then
> > it should use the same assumption as above, and reserve a frame buffer
> > area starting at the lowest address in low MMIO space. The reserved size
> > could be the max possible frame buffer size, which I think is 64 MiB (?).
> 
> It can be 128MB with the highest resolution 7680*4320 (I hope the
> highest resolution won't become bigger in the future).

Indeed!

> 
> > This still leaves low MMIO space for subsequent PCI devices, and allows
> > 32-bit BARs to continue to work. This approach requires one further
> > assumption, which is that the host, plus any movement by hyperv_drm,
> > has kept the frame buffer at the low end of the low MMIO space. From
> > what I've seen, that assumption is reality -- the frame buffer always
> > starts at the beginning of low MMIO space.
> >
> > This approach could be taken one step further, where vmbus_reserve_fb()
> > *always* reserves 64 MiB starting at the low end of low MMIO space,
> > regardless of the value of "start". The messy code for getting "start"
> > could be dropped entirely, and the dependency on CONFIG_SYSFB goes
> > away. Or maybe still get the value of "start" and "size", and if non-zero
> > just do a sanity check that they are within the fixed 64 MiB reserved area.
> >
> > Thoughts? To me tweaking vmbus_reserve_fb() is a more
> > straightforward and explicit way to do the reserving, vs. modifying
> > the requested range in the Hyper-V PCI driver.
> 
> Agreed. Let me try to make a new patch for review.
> 
> > And FWIW, it avoids  introducing the 32-bit BAR limitation.
> 
> This patch addresses the MMIO conflict for 64-bit BARs and not for
> 32-bit BARs (if any). The patch does not introduce the 32-bit BAR limitation.

Right.  I misinterpreted the problem you mentioned about 32-bit BARs.

Michael

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-08 13:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-02 23:43 [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window Dexuan Cui
2026-04-05 23:15 ` Michael Kelley
2026-04-08  9:24   ` Dexuan Cui
2026-04-08 13:53     ` Michael Kelley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox