public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dexuan Cui <decui@microsoft.com>
To: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, longli@microsoft.com, lpieralisi@kernel.org,
	kwilczynski@kernel.org, mani@kernel.org, robh@kernel.org,
	bhelgaas@google.com, jakeo@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, mhklinux@outlook.com,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
Cc: Krister Johansen <johansen@templeofstupid.com>, stable@vger.kernel.org
Subject: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
Date: Thu,  2 Apr 2026 16:43:13 -0700	[thread overview]
Message-ID: <20260402234313.2490779-1-decui@microsoft.com> (raw)

There has been a longstanding MMIO conflict between the pci_hyperv
driver's config_window (see hv_allocate_config_window()) and the
hyperv_drm (or hyperv_fb) driver (see hyperv_setup_vram()): typically
both get MMIO from the low MMIO range below 4GB; this is not an issue
in the normal kernel since the VMBus driver reserves the framebuffer
MMIO range in vmbus_reserve_fb(), so the drm driver's hyperv_setup_vram()
can always get the reserved framebuffer MMIO; however, a Gen2 VM's
kdump kernel can fail to reserve the framebuffer MMIO in
vmbus_reserve_fb() because the screen_info.lfb_base is zero in the
kdump kernel due to several possible reasons (see the Link below for
more details):

1) on ARM64, the two syscalls (KEXEC_LOAD, KEXEC_FILE_LOAD) don't
initialize the screen_info.lfb_base for the kdump kernel;

2) on x86-64, the KEXEC_FILE_LOAD syscall initializes kdump kernel's
screen_info.lfb_base, but the KEXEC_LOAD syscall doesn't really do that
when the hyperv_drm driver loads, because the user-space kexec-tools
(i.e. the program 'kexec') doesn't recognize the hyperv_drm driver
(let's ignore the behavior of kexec-tools of very old versions).

When vmbus_reserve_fb() fails to reserve the framebuffer MMIO in the
kdump kernel, if pci_hyperv in the kdump kernel loads before hyperv_drm
loads, pci_hyperv's vmbus_allocate_mmio() gets the framebuffer MMIO
and tries to use it, but since the host thinks that the MMIO range is
still in use by hyperv_drm, the host refuses to accept the MMIO range
as the config window, and pci_hyperv's hv_pci_enter_d0() errors out,
e.g. an error can be "PCI Pass-through VSP failed D0 Entry with status
c0370048".

Typically, this pci_hyperv error in the kdump kernel was not fatal in
the past because the kdump kernel typically doesn't rely on pci_hyperv,
i.e. the root file system is on a VMBus SCSI device.

Now, a VM on Azure can boot from NVMe, i.e. the root file system can be
on a NVMe device, which depends on pci_hyperv. When the error occurs,
the kdump kernel fails to boot up since no root file system is detected.

Fix the MMIO conflict by allocating MMIO above 4GB for the config_window,
so it won't conflict with hyperv_drm's MMIO, which should be below the
4GB boundary. The size of config_window is small: it's only 8KB per PCI
device, so there should be sufficient MMIO space available above 4GB.

Note: we still need to figure out how to address the possible MMIO
conflict between hyperv_drm and pci_hyperv in the case of 32-bit PCI
MMIO BARs, but that's of low priority because all PCI devices available
to a Linux VM on Azure or on a modern host should use 64-bit BARs and
should not use 32-bit BARs -- I checked Mellanox VFs, MANA VFs, NVMe
devices, and GPUs in Linux VMs on Azure, and found no 32-bit BARs.

Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Link: https://lore.kernel.org/all/SA1PR21MB692176C1BC53BFC9EAE5CF8EBF51A@SA1PR21MB6921.namprd21.prod.outlook.com/
Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com>
Tested-by: Krister Johansen <johansen@templeofstupid.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: stable@vger.kernel.org
---

Changes since v1:
    Updated the commit message and the comment to better explain
    why screen_info.lfb_base can be 0 in the kdump kernel.

    No code change since v1.


 drivers/pci/controller/pci-hyperv.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 2c7a406b4ba8..1a79334ea9f4 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3403,9 +3403,26 @@ static int hv_allocate_config_window(struct hv_pcibus_device *hbus)
 
 	/*
 	 * Set up a region of MMIO space to use for accessing configuration
-	 * space.
+	 * space. Use the high MMIO range to not conflict with the hyperv_drm
+	 * driver (which normally gets MMIO from the low MMIO range) in the
+	 * kdump kernel of a Gen2 VM, which may fail to reserve the framebuffer
+	 * MMIO range in vmbus_reserve_fb() due to screen_info.lfb_base being
+	 * zero in the kdump kernel:
+	 *
+	 * on ARM64, the two syscalls (KEXEC_LOAD, KEXEC_FILE_LOAD) don't
+	 * initialize the screen_info.lfb_base for the kdump kernel;
+	 *
+	 * on x86-64, the KEXEC_FILE_LOAD syscall initializes kdump kernel's
+	 * screen_info.lfb_base (see bzImage64_load() -> setup_boot_parameters())
+	 * but the KEXEC_LOAD syscall doesn't really do that when the hyperv_drm
+	 * driver loads, because the user-space program 'kexec' doesn't
+	 * recognize hyperv_drm: see the function setup_linux_vesafb() in the
+	 * kexec-tools.git repo. Note: old versions of kexec-tools, e.g.
+	 * v2.0.18, initialize screen_info.lfb_base if the hyperv_fb driver
+	 * loads, but hyperv_fb is deprecated and has been removed from the
+	 * mainline kernel.
 	 */
-	ret = vmbus_allocate_mmio(&hbus->mem_config, hbus->hdev, 0, -1,
+	ret = vmbus_allocate_mmio(&hbus->mem_config, hbus->hdev, SZ_4G, -1,
 				  PCI_CONFIG_MMIO_LENGTH, 0x1000, false);
 	if (ret)
 		return ret;
-- 
2.43.0


             reply	other threads:[~2026-04-02 23:44 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-02 23:43 Dexuan Cui [this message]
2026-04-05 23:15 ` [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window Michael Kelley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260402234313.2490779-1-decui@microsoft.com \
    --to=decui@microsoft.com \
    --cc=bhelgaas@google.com \
    --cc=haiyangz@microsoft.com \
    --cc=jakeo@microsoft.com \
    --cc=johansen@templeofstupid.com \
    --cc=kjlx@templeofstupid.com \
    --cc=kwilczynski@kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=lpieralisi@kernel.org \
    --cc=mani@kernel.org \
    --cc=matthew.ruffell@canonical.com \
    --cc=mhklinux@outlook.com \
    --cc=robh@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox