From: Dexuan Cui <decui@microsoft.com>
To: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, longli@microsoft.com, lpieralisi@kernel.org,
kwilczynski@kernel.org, mani@kernel.org, robh@kernel.org,
bhelgaas@google.com, jakeo@microsoft.com,
linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, mhklinux@outlook.com,
matthew.ruffell@canonical.com, kjlx@templeofstupid.com
Cc: Krister Johansen <johansen@templeofstupid.com>, stable@vger.kernel.org
Subject: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
Date: Thu, 2 Apr 2026 16:43:13 -0700 [thread overview]
Message-ID: <20260402234313.2490779-1-decui@microsoft.com> (raw)
There has been a longstanding MMIO conflict between the pci_hyperv
driver's config_window (see hv_allocate_config_window()) and the
hyperv_drm (or hyperv_fb) driver (see hyperv_setup_vram()): typically
both get MMIO from the low MMIO range below 4GB; this is not an issue
in the normal kernel since the VMBus driver reserves the framebuffer
MMIO range in vmbus_reserve_fb(), so the drm driver's hyperv_setup_vram()
can always get the reserved framebuffer MMIO; however, a Gen2 VM's
kdump kernel can fail to reserve the framebuffer MMIO in
vmbus_reserve_fb() because the screen_info.lfb_base is zero in the
kdump kernel due to several possible reasons (see the Link below for
more details):
1) on ARM64, the two syscalls (KEXEC_LOAD, KEXEC_FILE_LOAD) don't
initialize the screen_info.lfb_base for the kdump kernel;
2) on x86-64, the KEXEC_FILE_LOAD syscall initializes kdump kernel's
screen_info.lfb_base, but the KEXEC_LOAD syscall doesn't really do that
when the hyperv_drm driver loads, because the user-space kexec-tools
(i.e. the program 'kexec') doesn't recognize the hyperv_drm driver
(let's ignore the behavior of kexec-tools of very old versions).
When vmbus_reserve_fb() fails to reserve the framebuffer MMIO in the
kdump kernel, if pci_hyperv in the kdump kernel loads before hyperv_drm
loads, pci_hyperv's vmbus_allocate_mmio() gets the framebuffer MMIO
and tries to use it, but since the host thinks that the MMIO range is
still in use by hyperv_drm, the host refuses to accept the MMIO range
as the config window, and pci_hyperv's hv_pci_enter_d0() errors out,
e.g. an error can be "PCI Pass-through VSP failed D0 Entry with status
c0370048".
Typically, this pci_hyperv error in the kdump kernel was not fatal in
the past because the kdump kernel typically doesn't rely on pci_hyperv,
i.e. the root file system is on a VMBus SCSI device.
Now, a VM on Azure can boot from NVMe, i.e. the root file system can be
on a NVMe device, which depends on pci_hyperv. When the error occurs,
the kdump kernel fails to boot up since no root file system is detected.
Fix the MMIO conflict by allocating MMIO above 4GB for the config_window,
so it won't conflict with hyperv_drm's MMIO, which should be below the
4GB boundary. The size of config_window is small: it's only 8KB per PCI
device, so there should be sufficient MMIO space available above 4GB.
Note: we still need to figure out how to address the possible MMIO
conflict between hyperv_drm and pci_hyperv in the case of 32-bit PCI
MMIO BARs, but that's of low priority because all PCI devices available
to a Linux VM on Azure or on a modern host should use 64-bit BARs and
should not use 32-bit BARs -- I checked Mellanox VFs, MANA VFs, NVMe
devices, and GPUs in Linux VMs on Azure, and found no 32-bit BARs.
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Link: https://lore.kernel.org/all/SA1PR21MB692176C1BC53BFC9EAE5CF8EBF51A@SA1PR21MB6921.namprd21.prod.outlook.com/
Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com>
Tested-by: Krister Johansen <johansen@templeofstupid.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: stable@vger.kernel.org
---
Changes since v1:
Updated the commit message and the comment to better explain
why screen_info.lfb_base can be 0 in the kdump kernel.
No code change since v1.
drivers/pci/controller/pci-hyperv.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 2c7a406b4ba8..1a79334ea9f4 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3403,9 +3403,26 @@ static int hv_allocate_config_window(struct hv_pcibus_device *hbus)
/*
* Set up a region of MMIO space to use for accessing configuration
- * space.
+ * space. Use the high MMIO range to not conflict with the hyperv_drm
+ * driver (which normally gets MMIO from the low MMIO range) in the
+ * kdump kernel of a Gen2 VM, which may fail to reserve the framebuffer
+ * MMIO range in vmbus_reserve_fb() due to screen_info.lfb_base being
+ * zero in the kdump kernel:
+ *
+ * on ARM64, the two syscalls (KEXEC_LOAD, KEXEC_FILE_LOAD) don't
+ * initialize the screen_info.lfb_base for the kdump kernel;
+ *
+ * on x86-64, the KEXEC_FILE_LOAD syscall initializes kdump kernel's
+ * screen_info.lfb_base (see bzImage64_load() -> setup_boot_parameters())
+ * but the KEXEC_LOAD syscall doesn't really do that when the hyperv_drm
+ * driver loads, because the user-space program 'kexec' doesn't
+ * recognize hyperv_drm: see the function setup_linux_vesafb() in the
+ * kexec-tools.git repo. Note: old versions of kexec-tools, e.g.
+ * v2.0.18, initialize screen_info.lfb_base if the hyperv_fb driver
+ * loads, but hyperv_fb is deprecated and has been removed from the
+ * mainline kernel.
*/
- ret = vmbus_allocate_mmio(&hbus->mem_config, hbus->hdev, 0, -1,
+ ret = vmbus_allocate_mmio(&hbus->mem_config, hbus->hdev, SZ_4G, -1,
PCI_CONFIG_MMIO_LENGTH, 0x1000, false);
if (ret)
return ret;
--
2.43.0
next reply other threads:[~2026-04-02 23:44 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-02 23:43 Dexuan Cui [this message]
2026-04-05 23:15 ` [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window Michael Kelley
2026-04-08 9:24 ` Dexuan Cui
2026-04-08 13:53 ` Michael Kelley
2026-04-15 15:30 ` Dexuan Cui
2026-04-15 16:46 ` Dexuan Cui
2026-04-23 17:40 ` Michael Kelley
2026-04-29 1:58 ` Dexuan Cui
2026-04-29 18:01 ` Michael Kelley
2026-04-16 18:49 ` Dexuan Cui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260402234313.2490779-1-decui@microsoft.com \
--to=decui@microsoft.com \
--cc=bhelgaas@google.com \
--cc=haiyangz@microsoft.com \
--cc=jakeo@microsoft.com \
--cc=johansen@templeofstupid.com \
--cc=kjlx@templeofstupid.com \
--cc=kwilczynski@kernel.org \
--cc=kys@microsoft.com \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=longli@microsoft.com \
--cc=lpieralisi@kernel.org \
--cc=mani@kernel.org \
--cc=matthew.ruffell@canonical.com \
--cc=mhklinux@outlook.com \
--cc=robh@kernel.org \
--cc=stable@vger.kernel.org \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.