public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon@kernel.org
To: kvm@vger.kernel.org
Subject: [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
Date: Thu, 01 May 2025 14:42:07 +0000	[thread overview]
Message-ID: <bug-220057-28872-SKb2todkZo@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-220057-28872@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=220057

--- Comment #40 from Alex Williamson (alex.williamson@redhat.com) ---
The vfio_pci_mmap_huge_fault logs with order >0 and ending in 0x800 are normal,
they're indicating we can't create the huge page mapping due to alignment
requirements, 0x800 is VM_FAULT_FALLBACK (ie. fallback to a smaller mapping).

However, there are three instances of:

May 01 10:01:37 pve QEMU[1972]: error: kvm run failed Bad address
May 01 10:01:37 pve QEMU[1972]: error: kvm run failed Bad address
May 01 10:01:37 pve QEMU[1972]: error: kvm run failed Bad address

And three instances of:

May 01 10:01:37 pve kernel: vfio-pci 0000:01:00.0:
vfio_pci_mmap_huge_fault(,order = 0) BAR 1 page offset 0x3798: 0x1
May 01 10:01:37 pve kernel: vfio-pci 0000:01:00.0:
vfio_pci_mmap_huge_fault(,order = 0) BAR 1 page offset 0x3710: 0x1
May 01 10:01:37 pve kernel: vfio-pci 0000:01:00.0:
vfio_pci_mmap_huge_fault(,order = 0) BAR 1 page offset 0x3688: 0x1

0x1 is VM_FAULT_OOM, so likely at some point in trying to insert the pte, we
got an -ENOMEM.

The system has 128GB of RAM, 98GB of which is dedicated to 1G hugepages.  This
VM is configured for 32GB.  What happens if fewer hugepages are reserved?

Also note that if we were able to populate the MMIO mappings using huge pages,
which would occur if the VM BIOS had placed the mappings within the DMA
mappable range of the IOMMU (ie. the VFIO_MAP_DMA failures), we'd be using
fewer page table entries than even the previous code (ie. less memory).  The
issue might simply come down to the fact that previously we attempted to fault
in the entire MMIO mapping on the first fault, at that time memory was
available, but now we fault on access with the expectation that we're faulting
less due to huge pages, but the latter is not coming to fruition due to the bad
VM configuration.

I think we're going to need to figure out if/how Proxmox enables setting
guest-phys-bits=39 or the host needs to free up some memory from hugepages.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

  parent reply	other threads:[~2025-05-01 14:42 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-27  0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
2025-04-27  0:48 ` [Bug 220057] " bugzilla-daemon
2025-04-27  0:50 ` bugzilla-daemon
2025-04-27 23:34 ` bugzilla-daemon
2025-04-28  0:00 ` bugzilla-daemon
2025-04-28  7:18 ` bugzilla-daemon
2025-04-28  8:25 ` bugzilla-daemon
2025-04-28 15:10 ` bugzilla-daemon
2025-04-28 19:41 ` bugzilla-daemon
2025-04-28 21:11 ` bugzilla-daemon
2025-04-28 21:22 ` bugzilla-daemon
2025-04-28 21:24 ` bugzilla-daemon
2025-04-28 21:26 ` bugzilla-daemon
2025-04-28 21:39 ` bugzilla-daemon
2025-04-28 21:42 ` bugzilla-daemon
2025-04-28 21:49 ` bugzilla-daemon
2025-04-28 22:15 ` bugzilla-daemon
2025-04-28 22:27 ` bugzilla-daemon
2025-04-28 22:29 ` bugzilla-daemon
2025-04-28 22:31 ` bugzilla-daemon
2025-04-28 22:46 ` bugzilla-daemon
2025-04-28 23:05 ` bugzilla-daemon
2025-04-29  6:54 ` bugzilla-daemon
2025-04-29  8:08 ` bugzilla-daemon
2025-04-29  8:09 ` bugzilla-daemon
2025-04-29  8:14 ` bugzilla-daemon
2025-04-29 14:58 ` bugzilla-daemon
2025-04-29 15:02 ` bugzilla-daemon
2025-04-29 15:09 ` bugzilla-daemon
2025-04-29 15:15 ` bugzilla-daemon
2025-04-29 15:18 ` bugzilla-daemon
2025-04-29 15:22 ` bugzilla-daemon
2025-04-29 15:25 ` bugzilla-daemon
2025-04-29 15:35 ` bugzilla-daemon
2025-04-29 18:36 ` bugzilla-daemon
2025-04-29 20:09 ` bugzilla-daemon
2025-04-30  0:20 ` bugzilla-daemon
2025-04-30  0:41 ` bugzilla-daemon
2025-04-30  7:32 ` bugzilla-daemon
2025-04-30 22:33 ` bugzilla-daemon
2025-05-01  9:05 ` bugzilla-daemon
2025-05-01 14:42 ` bugzilla-daemon [this message]
2025-05-01 15:03 ` bugzilla-daemon
2025-05-01 15:21 ` bugzilla-daemon
2025-05-01 15:28 ` bugzilla-daemon
2025-05-02 16:12 ` bugzilla-daemon
2025-05-02 17:15 ` bugzilla-daemon
2025-05-02 22:52 ` bugzilla-daemon
2025-05-02 23:06 ` bugzilla-daemon
2025-05-06  9:47 ` bugzilla-daemon
2025-05-06  9:57 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-220057-28872-SKb2todkZo@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@kernel.org \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox