* [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
@ 2025-04-27 0:46 bugzilla-daemon
2025-04-27 0:48 ` [Bug 220057] " bugzilla-daemon
` (49 more replies)
0 siblings, 50 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-27 0:46 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
Bug ID: 220057
Summary: Kernel regression. Linux VMs crashing (I did not test
Windows guest VMs)
Product: Virtualization
Version: unspecified
Hardware: All
OS: Linux
Status: NEW
Severity: blocking
Priority: P3
Component: kvm
Assignee: virtualization_kvm@kernel-bugs.osdl.org
Reporter: adolfotregosa@gmail.com
Regression: No
Created attachment 308028
--> https://bugzilla.kernel.org/attachment.cgi?id=308028&action=edit
journalctl
I found a kernel regression. I'm using Proxmox, and any kernel with the
following commit:
https://github.com/torvalds/linux/commit/f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101
causes an instant VM crash in some situations that involve GPU acceleration.
I’m using an NVIDIA GPU passthrough, but another person experienced the same
crashes with an AMD 9070 XT. In my case, this occurs when playing a simple
YouTube video in Chromium-based browsers or when running some games.
I have confirmed that reverting this commit prevents my Linux VMs from
crashing.
I’ve attached a log showing what the host’s journalctl log displays. The error
is always exactly the same.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
@ 2025-04-27 0:48 ` bugzilla-daemon
2025-04-27 0:50 ` bugzilla-daemon
` (48 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-27 0:48 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #1 from Adolfo (adolfotregosa@gmail.com) ---
Created attachment 308029
--> https://bugzilla.kernel.org/attachment.cgi?id=308029&action=edit
revert patch
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
2025-04-27 0:48 ` [Bug 220057] " bugzilla-daemon
@ 2025-04-27 0:50 ` bugzilla-daemon
2025-04-27 23:34 ` bugzilla-daemon
` (47 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-27 0:50 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
Adolfo (adolfotregosa@gmail.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
Bisected commit-id| |f9e54c3a2f5b79ecc57c7bc7d0d
| |3521e461a2101
Regression|No |Yes
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
2025-04-27 0:48 ` [Bug 220057] " bugzilla-daemon
2025-04-27 0:50 ` bugzilla-daemon
@ 2025-04-27 23:34 ` bugzilla-daemon
2025-04-28 0:00 ` bugzilla-daemon
` (46 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-27 23:34 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
Artem S. Tashkinov (aros@gmx.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |alex.williamson@redhat.com
--- Comment #2 from Artem S. Tashkinov (aros@gmx.com) ---
Alex, please take a look.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (2 preceding siblings ...)
2025-04-27 23:34 ` bugzilla-daemon
@ 2025-04-28 0:00 ` bugzilla-daemon
2025-04-28 7:18 ` bugzilla-daemon
` (45 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 0:00 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #3 from Alex Williamson (alex.williamson@redhat.com) ---
https://github.com/torvalds/linux/commit/09dfc8a5f2ce897005a94bf66cca4f91e4e03700
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (3 preceding siblings ...)
2025-04-28 0:00 ` bugzilla-daemon
@ 2025-04-28 7:18 ` bugzilla-daemon
2025-04-28 8:25 ` bugzilla-daemon
` (44 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 7:18 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #4 from Adolfo (adolfotregosa@gmail.com) ---
(In reply to Alex Williamson from comment #3)
> https://github.com/torvalds/linux/commit/
> 09dfc8a5f2ce897005a94bf66cca4f91e4e03700
I should have specified that I'm running kernel 6.14.4.
If that helps.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (4 preceding siblings ...)
2025-04-28 7:18 ` bugzilla-daemon
@ 2025-04-28 8:25 ` bugzilla-daemon
2025-04-28 15:10 ` bugzilla-daemon
` (43 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 8:25 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #5 from Adolfo (adolfotregosa@gmail.com) ---
(In reply to Alex Williamson from comment #3)
> https://github.com/torvalds/linux/commit/
> 09dfc8a5f2ce897005a94bf66cca4f91e4e03700
I checked. That commit isn't a fix for the crashes in my case, since I tested
vanilla 6.14.4 and that commit is already present. How can I help if needed?
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (5 preceding siblings ...)
2025-04-28 8:25 ` bugzilla-daemon
@ 2025-04-28 15:10 ` bugzilla-daemon
2025-04-28 19:41 ` bugzilla-daemon
` (42 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 15:10 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #6 from Alex Williamson (alex.williamson@redhat.com) ---
What's the VM configuration? The GPU assigned? The host CPU? The QEMU
version? Is the guest using novueau or the nvidia driver? Please link the
other report of this issue.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (6 preceding siblings ...)
2025-04-28 15:10 ` bugzilla-daemon
@ 2025-04-28 19:41 ` bugzilla-daemon
2025-04-28 21:11 ` bugzilla-daemon
` (41 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 19:41 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #7 from Adolfo (adolfotregosa@gmail.com) ---
(In reply to Alex Williamson from comment #6)
> What's the VM configuration? The GPU assigned? The host CPU? The QEMU
> version? Is the guest using novueau or the nvidia driver? Please link the
> other report of this issue.
13900 ,z790 chipset, 128GB ram.
Guest set to Q35. Cpu set to host. qemu 9.2. Guest is using nvidia driver.
Crash happen on both a 4060ti and 5060ti.
Other report but with AMD 9070 XT.
https://forum.proxmox.com/threads/opt-in-linux-6-14-kernel-for-proxmox-ve-8-available-on-test-no-subscription.164497/page-5#post-763760
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (7 preceding siblings ...)
2025-04-28 19:41 ` bugzilla-daemon
@ 2025-04-28 21:11 ` bugzilla-daemon
2025-04-28 21:22 ` bugzilla-daemon
` (40 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 21:11 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #8 from Alex Williamson (alex.williamson@redhat.com) ---
(In reply to Adolfo from comment #7)
> 13900 ,z790 chipset, 128GB ram.
> Guest set to Q35. Cpu set to host. qemu 9.2. Guest is using nvidia driver.
> Crash happen on both a 4060ti and 5060ti.
>
> Other report but with AMD 9070 XT.
>
> https://forum.proxmox.com/threads/opt-in-linux-6-14-kernel-for-proxmox-ve-8-
> available-on-test-no-subscription.164497/page-5#post-763760
I'm not able to access the attachments of this report without a proxmox
subscription key, so I can't make any conclusions whether this is related. I
do note the post is originally dated April 14th, so it's not based on v6.14.4,
it might be based on a kernel with broken bus reset support that was reverted
in v6.14.4.
I don't see any similar issues running a stock 6.14.4 kernel, qemu 9.2, Linux
guest (6.14.3) running nvidia 570.144, youtube playback in chromium.
Please provide full VM XML or libvirt log, host 'sudo dmesg', host 'sudo lspci
-vvv', guest nvidia driver version.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (8 preceding siblings ...)
2025-04-28 21:11 ` bugzilla-daemon
@ 2025-04-28 21:22 ` bugzilla-daemon
2025-04-28 21:24 ` bugzilla-daemon
` (39 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 21:22 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #9 from Adolfo (adolfotregosa@gmail.com) ---
Created attachment 308041
--> https://bugzilla.kernel.org/attachment.cgi?id=308041&action=edit
Proxmox forum screenshots
You don't need a subscription. Just create an account. Either way you have the
screenshots attached.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (9 preceding siblings ...)
2025-04-28 21:22 ` bugzilla-daemon
@ 2025-04-28 21:24 ` bugzilla-daemon
2025-04-28 21:26 ` bugzilla-daemon
` (38 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 21:24 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #10 from Adolfo (adolfotregosa@gmail.com) ---
Created attachment 308042
--> https://bugzilla.kernel.org/attachment.cgi?id=308042&action=edit
dmesg output
dmesg output.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (10 preceding siblings ...)
2025-04-28 21:24 ` bugzilla-daemon
@ 2025-04-28 21:26 ` bugzilla-daemon
2025-04-28 21:39 ` bugzilla-daemon
` (37 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 21:26 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #11 from Adolfo (adolfotregosa@gmail.com) ---
Created attachment 308043
--> https://bugzilla.kernel.org/attachment.cgi?id=308043&action=edit
lspci -vvv
lspci -vvv
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (11 preceding siblings ...)
2025-04-28 21:26 ` bugzilla-daemon
@ 2025-04-28 21:39 ` bugzilla-daemon
2025-04-28 21:42 ` bugzilla-daemon
` (36 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 21:39 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #12 from Adolfo (adolfotregosa@gmail.com) ---
I tested both NVIDIA driver versions 570.144 and the beta 575.51.02.
Assuming my machine is fine — since reverting that commit resolved the issue —
I believe the reason there are so few reports is that Proxmox still ships with
kernel 6.8. We had an opt-in for 6.11.11, and it also works fine.
Recently, 6.14 was offered as an opt-in, and that's what led me down the rabbit
hole. I started compiling vanilla kernels: 6.12.25 LTS crashed, and 6.12-rc1
crashed as well. This indicated that the problem was introduced between 6.11.11
and 6.12-rc1.
I performed a bisect, which led me to the problematic commit. After reverting
that commit, I can now run 6.14.4 without issues, just like 6.11.11.
Regarding Proxmox VM guest configuration. Proxmox does not use libvirt but I
leave the VM configuration file bellow.
GNU nano 7.2 /etc/pve/qemu-server/200.conf
-------------
affinity: 0-7
agent: 0
args: -machine hpet=off
balloon: 0
bios: ovmf
boot: order=hostpci2
cores: 8
cpu: host,flags=+pdpe1gb
cpuunits: 200
efidisk0: local-lvm:vm-200-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hookscript: local:snippets/200.pl
hostpci0: 0000:01:00,pcie=1,rombar=0
hostpci1: 0000:00:14.0,rombar=0
hostpci2: 0000:06:00.0,rombar=0
hostpci4: 0000:03:0a.2,rombar=0
hotplug: usb
hugepages: 1024
kvm: 1
machine: q35
memory: 32768
meta: creation-qemu=9.0.2,ctime=1737473987
name: cachyOS
numa: 1
onboot: 0
ostype: l26
scsihw: virtio-scsi-single
smbios1: uuid=f75921fc-d45e-4463-8590-8a4aab19e6e8
sockets: 1
startup: order=3,up=300,down=20
tablet: 0
vga: none
vmgenid: 2cd9b643-58a2-4ac4-a01c-f6a131e65c6d
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (12 preceding siblings ...)
2025-04-28 21:39 ` bugzilla-daemon
@ 2025-04-28 21:42 ` bugzilla-daemon
2025-04-28 21:49 ` bugzilla-daemon
` (35 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 21:42 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #13 from Adolfo (adolfotregosa@gmail.com) ---
Created attachment 308044
--> https://bugzilla.kernel.org/attachment.cgi?id=308044&action=edit
proxmox_VM
Proxmox VM Hardware config and qemu version screenshot
Hopefully, I provided all the information you asked for.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (13 preceding siblings ...)
2025-04-28 21:42 ` bugzilla-daemon
@ 2025-04-28 21:49 ` bugzilla-daemon
2025-04-28 22:15 ` bugzilla-daemon
` (34 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 21:49 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #14 from Adolfo (adolfotregosa@gmail.com) ---
Created attachment 308045
--> https://bugzilla.kernel.org/attachment.cgi?id=308045&action=edit
vfio_map_dma failed
I forgot. I have no idea if this is even remotely linked but I'll leave it here
just in case.
host journalctl: vfio_map_dma failed.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (14 preceding siblings ...)
2025-04-28 21:49 ` bugzilla-daemon
@ 2025-04-28 22:15 ` bugzilla-daemon
2025-04-28 22:27 ` bugzilla-daemon
` (33 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 22:15 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #15 from Alex Williamson (alex.williamson@redhat.com) ---
(In reply to Adolfo from comment #14)
> Created attachment 308045 [details]
> vfio_map_dma failed
>
> I forgot. I have no idea if this is even remotely linked but I'll leave it
> here just in case.
>
> host journalctl: vfio_map_dma failed.
Does adding ",phys-bits=39" to the cpu: line in the config file resolve these
errors? Please include output of lscpu.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (15 preceding siblings ...)
2025-04-28 22:15 ` bugzilla-daemon
@ 2025-04-28 22:27 ` bugzilla-daemon
2025-04-28 22:29 ` bugzilla-daemon
` (32 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 22:27 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #16 from Adolfo (adolfotregosa@gmail.com) ---
(In reply to Alex Williamson from comment #15)
> (In reply to Adolfo from comment #14)
> > Created attachment 308045 [details]
> > vfio_map_dma failed
> >
> > I forgot. I have no idea if this is even remotely linked but I'll leave it
> > here just in case.
> >
> > host journalctl: vfio_map_dma failed.
>
> Does adding ",phys-bits=39" to the cpu: line in the config file resolve
> these errors? Please include output of lscpu.
Doesn't seam to do anything, no.
----
cores: 8
cpu: host,flags=+pdpe1gb,phys-bits=39
cpuunits: 200
efidisk0: local-lvm:vm-200-disk-0,efitype=4m,pre-enrolled-keys=1,size=4
...
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (16 preceding siblings ...)
2025-04-28 22:27 ` bugzilla-daemon
@ 2025-04-28 22:29 ` bugzilla-daemon
2025-04-28 22:31 ` bugzilla-daemon
` (31 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 22:29 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #17 from Adolfo (adolfotregosa@gmail.com) ---
Created attachment 308046
--> https://bugzilla.kernel.org/attachment.cgi?id=308046&action=edit
phys-bits=39
log with phys-bits=39 on cpu line
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (17 preceding siblings ...)
2025-04-28 22:29 ` bugzilla-daemon
@ 2025-04-28 22:31 ` bugzilla-daemon
2025-04-28 22:46 ` bugzilla-daemon
` (30 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 22:31 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #18 from Adolfo (adolfotregosa@gmail.com) ---
Created attachment 308047
--> https://bugzilla.kernel.org/attachment.cgi?id=308047&action=edit
lspcu output
lspcu output has requested.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (18 preceding siblings ...)
2025-04-28 22:31 ` bugzilla-daemon
@ 2025-04-28 22:46 ` bugzilla-daemon
2025-04-28 23:05 ` bugzilla-daemon
` (29 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 22:46 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #19 from Alex Williamson (alex.williamson@redhat.com) ---
(In reply to Adolfo from comment #16)
>
> Doesn't seam to do anything, no.
>
> ----
> cores: 8
> cpu: host,flags=+pdpe1gb,phys-bits=39
> cpuunits: 200
> efidisk0: local-lvm:vm-200-disk-0,efitype=4m,pre-enrolled-keys=1,size=4
> ...
I'm getting that option from here:
https://pve.proxmox.com/wiki/Manual:_qm.conf
Can you find the QEMU command line in ps while the VM is running? ex. `ps aux |
grep qemu` There should be a difference in the QEMU command line proxmox is
using with the option, and it should at least change the addresses based at
0x380000000000 in the logs.
I think the issue with the failed mappings is that you CPU physical address
width is 46-bits:
Address sizes: 46 bits physical, 48 bits virtual
But the host IOMMU width is 39-bits:
[ 0.341856] DMAR: Host address width 39
Therefore the VM is giving the devices an IOVA that cannot be mapped by the
host IOMMU. I don't know if ultimately that contributes to the issue you're
reporting, but it might.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (19 preceding siblings ...)
2025-04-28 22:46 ` bugzilla-daemon
@ 2025-04-28 23:05 ` bugzilla-daemon
2025-04-29 6:54 ` bugzilla-daemon
` (28 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-28 23:05 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #20 from Adolfo (adolfotregosa@gmail.com) ---
Created attachment 308048
--> https://bugzilla.kernel.org/attachment.cgi?id=308048&action=edit
log_vm_start_up_to_crash
as far I can tell, it changes nothing. I loaded up unpatched kernel and
attached complete VM startup up to crash.
ps aux | grep qemu
at file startup.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (20 preceding siblings ...)
2025-04-28 23:05 ` bugzilla-daemon
@ 2025-04-29 6:54 ` bugzilla-daemon
2025-04-29 8:08 ` bugzilla-daemon
` (27 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-29 6:54 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
Fabian Grünbichler (f.gruenbichler@proxmox.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |f.gruenbichler@proxmox.com
--- Comment #21 from Fabian Grünbichler (f.gruenbichler@proxmox.com) ---
FWIW, you can get the full QEMU commandline for a given VM on PVE with "qm
showcmd XXX --pretty". you can also use this to verify whether config changes
have the desired effect ;)
our kernels are based on Ubuntu's, but since it seems you can also reproduce
the issue with a plain upstream kernel, I'll not go into too much detail about
that, unless you want me to.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (21 preceding siblings ...)
2025-04-29 6:54 ` bugzilla-daemon
@ 2025-04-29 8:08 ` bugzilla-daemon
2025-04-29 8:09 ` bugzilla-daemon
` (26 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-29 8:08 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #22 from Adolfo (adolfotregosa@gmail.com) ---
Created attachment 308049
--> https://bugzilla.kernel.org/attachment.cgi?id=308049&action=edit
phys-bits=host
It seams phys-bits=host, actually changes something although the "VFIO_MAP_DMA
failed" still shows up.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (22 preceding siblings ...)
2025-04-29 8:08 ` bugzilla-daemon
@ 2025-04-29 8:09 ` bugzilla-daemon
2025-04-29 8:14 ` bugzilla-daemon
` (25 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-29 8:09 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #23 from Adolfo (adolfotregosa@gmail.com) ---
Created attachment 308050
--> https://bugzilla.kernel.org/attachment.cgi?id=308050&action=edit
qm showcmd 200 --pretty
qm showcmd 200 --pretty output with phys-bits=host
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (23 preceding siblings ...)
2025-04-29 8:09 ` bugzilla-daemon
@ 2025-04-29 8:14 ` bugzilla-daemon
2025-04-29 14:58 ` bugzilla-daemon
` (24 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-29 8:14 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #24 from Adolfo (adolfotregosa@gmail.com) ---
or maybe not.. The address probably changed because I played with rebar setting
in the bios!? I don't have the knowledge to answer this. This VM does not have
phys-bits=host in the conf file.
Apr 29 09:12:09 pve QEMU[11923]: kvm: VFIO_MAP_DMA failed: Invalid argument
Apr 29 09:12:09 pve QEMU[11923]: kvm: vfio_container_dma_map(0x5c9222494280,
0x380000000000, 0x10000, 0x78075ee70000) = -22 (Invalid argument)
Apr 29 09:12:09 pve QEMU[11923]: kvm: VFIO_MAP_DMA failed: Invalid argument
Apr 29 09:12:09 pve QEMU[11923]: kvm: vfio_container_dma_map(0x5c9222494280,
0x380000011000, 0x3000, 0x78075ee69000) = -22 (Invalid argument)
Apr 29 09:12:09 pve QEMU[11923]: kvm: VFIO_MAP_DMA failed: Invalid argument
Apr 29 09:12:09 pve QEMU[11923]: kvm: vfio_container_dma_map(0x5c9222494280,
0x380000000000, 0x10000, 0x78075ee70000) = -22 (Invalid argument)
Apr 29 09:12:09 pve QEMU[11923]: kvm: VFIO_MAP_DMA failed: Invalid argument
Apr 29 09:12:09 pve QEMU[11923]: kvm: vfio_container_dma_map(0x5c9222494280,
0x380000011000, 0x3000, 0x78075ee69000) = -22 (Invalid argument)
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (24 preceding siblings ...)
2025-04-29 8:14 ` bugzilla-daemon
@ 2025-04-29 14:58 ` bugzilla-daemon
2025-04-29 15:02 ` bugzilla-daemon
` (23 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-29 14:58 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
Cédric Le Goater (clg@kaod.org) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |clg@kaod.org
--- Comment #25 from Cédric Le Goater (clg@kaod.org) ---
"-cpu host,guest-phys-bits=39" should help to define compatible address spaces.
Could you try please ?
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (25 preceding siblings ...)
2025-04-29 14:58 ` bugzilla-daemon
@ 2025-04-29 15:02 ` bugzilla-daemon
2025-04-29 15:09 ` bugzilla-daemon
` (22 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-29 15:02 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #26 from Adolfo (adolfotregosa@gmail.com) ---
(In reply to Cédric Le Goater from comment #25)
> "-cpu host,guest-phys-bits=39" should help to define compatible address
> spaces.
> Could you try please ?
per https://pve.proxmox.com/wiki/Manual:_qm.conf , guest-phys-bits does not
exist in proxmox, but yes, I can give it a try.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (26 preceding siblings ...)
2025-04-29 15:02 ` bugzilla-daemon
@ 2025-04-29 15:09 ` bugzilla-daemon
2025-04-29 15:15 ` bugzilla-daemon
` (21 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-29 15:09 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #27 from Adolfo (adolfotregosa@gmail.com) ---
(In reply to Cédric Le Goater from comment #25)
> "-cpu host,guest-phys-bits=39" should help to define compatible address
> spaces.
> Could you try please ?
vm 200 - unable to parse value of 'cpu' - format error
guest-phys-bits: property is not defined in schema and the schema does not
allow additional properties
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (27 preceding siblings ...)
2025-04-29 15:09 ` bugzilla-daemon
@ 2025-04-29 15:15 ` bugzilla-daemon
2025-04-29 15:18 ` bugzilla-daemon
` (20 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-29 15:15 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #28 from Alex Williamson (alex.williamson@redhat.com) ---
Another option may be to set the cpu as "cpu: kvm64" which is the default. I
noted somewhere this should present a 40-bit physical address space, which
might be close enough.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (28 preceding siblings ...)
2025-04-29 15:15 ` bugzilla-daemon
@ 2025-04-29 15:18 ` bugzilla-daemon
2025-04-29 15:22 ` bugzilla-daemon
` (19 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-29 15:18 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #29 from Adolfo (adolfotregosa@gmail.com) ---
(In reply to Alex Williamson from comment #28)
> Another option may be to set the cpu as "cpu: kvm64" which is the default.
> I noted somewhere this should present a 40-bit physical address space, which
> might be close enough.
If I recall correctly, cpu must be set to 'host' for nvidia gpu passthrough to
work. That said, I would still prefer to keep the CPU set to 'host'.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (29 preceding siblings ...)
2025-04-29 15:18 ` bugzilla-daemon
@ 2025-04-29 15:22 ` bugzilla-daemon
2025-04-29 15:25 ` bugzilla-daemon
` (18 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-29 15:22 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #30 from Alex Williamson (alex.williamson@redhat.com) ---
(In reply to Adolfo from comment #29)
> (In reply to Alex Williamson from comment #28)
> > Another option may be to set the cpu as "cpu: kvm64" which is the default.
> > I noted somewhere this should present a 40-bit physical address space,
> which
> > might be close enough.
>
> If I recall correctly, cpu must be set to 'host' for nvidia gpu passthrough
> to work. That said, I would still prefer to keep the CPU set to 'host'.
kvm64 works just fine with NVIDIA GPU assignment to a Linux guest for me.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (30 preceding siblings ...)
2025-04-29 15:22 ` bugzilla-daemon
@ 2025-04-29 15:25 ` bugzilla-daemon
2025-04-29 15:35 ` bugzilla-daemon
` (17 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-29 15:25 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #31 from Adolfo (adolfotregosa@gmail.com) ---
(In reply to Alex Williamson from comment #30)
> (In reply to Adolfo from comment #29)
> > (In reply to Alex Williamson from comment #28)
> > > Another option may be to set the cpu as "cpu: kvm64" which is the
> default.
> > > I noted somewhere this should present a 40-bit physical address space,
> > which
> > > might be close enough.
> >
> > If I recall correctly, cpu must be set to 'host' for nvidia gpu passthrough
> > to work. That said, I would still prefer to keep the CPU set to 'host'.
>
> kvm64 works just fine with NVIDIA GPU assignment to a Linux guest for me.
I just tested it remotely. VM will not even start.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (31 preceding siblings ...)
2025-04-29 15:25 ` bugzilla-daemon
@ 2025-04-29 15:35 ` bugzilla-daemon
2025-04-29 18:36 ` bugzilla-daemon
` (16 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-29 15:35 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #32 from Adolfo (adolfotregosa@gmail.com) ---
I retested using qemu64 and kvm64. The VM will start but does not boot unless
cpu is set to host, at least using this 5060TI gpu. IIRC the same happened with
the 4060TI I had previously.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (32 preceding siblings ...)
2025-04-29 15:35 ` bugzilla-daemon
@ 2025-04-29 18:36 ` bugzilla-daemon
2025-04-29 20:09 ` bugzilla-daemon
` (15 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-29 18:36 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #33 from Adolfo (adolfotregosa@gmail.com) ---
these VFIO_MAP_DMA failed are not that uncommon if you do a quick google
search. My gut tells me they are not related to what I'm reporting here.
https://forum.proxmox.com/threads/vga-pass-issues-with-radeon-rx-7900-xtx-kvm-vfio_map_dma-failed-invalid-argument.156795/
https://forum.proxmox.com/threads/vfio_map_dma-failed-invalid-argument.125888/
https://bbs.archlinux.org/viewtopic.php?id=299106
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (33 preceding siblings ...)
2025-04-29 18:36 ` bugzilla-daemon
@ 2025-04-29 20:09 ` bugzilla-daemon
2025-04-30 0:20 ` bugzilla-daemon
` (14 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-29 20:09 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #34 from Alex Williamson (alex.williamson@redhat.com) ---
Please run the following in the host before starting the guest and attach the
resulting host dmesg logs after running the guest:
# echo "func vfio_pci_mmap_huge_fault +p" > /proc/dynamic_debug/control
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (34 preceding siblings ...)
2025-04-29 20:09 ` bugzilla-daemon
@ 2025-04-30 0:20 ` bugzilla-daemon
2025-04-30 0:41 ` bugzilla-daemon
` (13 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-30 0:20 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #35 from Adolfo (adolfotregosa@gmail.com) ---
Created attachment 308055
--> https://bugzilla.kernel.org/attachment.cgi?id=308055&action=edit
vm startup
here, full vm log from startup to shutdown after running:
echo "func vfio_pci_mmap_huge_fault +p" > /proc/dynamic_debug/control
on the host, but I did not spot anything different from usual.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (35 preceding siblings ...)
2025-04-30 0:20 ` bugzilla-daemon
@ 2025-04-30 0:41 ` bugzilla-daemon
2025-04-30 7:32 ` bugzilla-daemon
` (12 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-30 0:41 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #36 from Alex Williamson (alex.williamson@redhat.com) ---
Please make sure the vfio-pci module is already loaded before issuing the
dynamic debug command, I fought with this some myself, ie. modprobe vfio-pci.
There should be vfio_pci_mmap_huge_fault lines in the log.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (36 preceding siblings ...)
2025-04-30 0:41 ` bugzilla-daemon
@ 2025-04-30 7:32 ` bugzilla-daemon
2025-04-30 22:33 ` bugzilla-daemon
` (11 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-30 7:32 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #37 from Adolfo (adolfotregosa@gmail.com) ---
Created attachment 308056
--> https://bugzilla.kernel.org/attachment.cgi?id=308056&action=edit
vm startup 2
I'm sure it was loaded. I have it set to load on host boot, and it binds all of
the VFs from the X710 network card. One of my other VMs is running OPNsense
with two VFs from the X710 passed through, so I'm 100% certain it was
loaded—otherwise, I wouldn't have internet access.
The issue turned out to be that I was running my patched kernel. I booted the
host using the Proxmox kernel 6.14.0 instead.
It now shows the information you requested.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (37 preceding siblings ...)
2025-04-30 7:32 ` bugzilla-daemon
@ 2025-04-30 22:33 ` bugzilla-daemon
2025-05-01 9:05 ` bugzilla-daemon
` (10 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-04-30 22:33 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #38 from Alex Williamson (alex.williamson@redhat.com) ---
Created attachment 308062
--> https://bugzilla.kernel.org/attachment.cgi?id=308062&action=edit
patch - log errors from huge fault handler
I'm still not seeing any leads for what might be the problem. The QEMU dumps
suggest a problem handling a page fault and the original report implicates the
new huge_fault handler. So let's log anything from the huge_fault handler that
doesn't install the pte. Please apply this patch to your stock v6.14.4 kernel
and report the resulting dmesg after the VM crash.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (38 preceding siblings ...)
2025-04-30 22:33 ` bugzilla-daemon
@ 2025-05-01 9:05 ` bugzilla-daemon
2025-05-01 14:42 ` bugzilla-daemon
` (9 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-05-01 9:05 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #39 from Adolfo (adolfotregosa@gmail.com) ---
Created attachment 308063
--> https://bugzilla.kernel.org/attachment.cgi?id=308063&action=edit
vm log
Stock 6.14.4 with only your debug patch applied up to vm crash log.
I also ran echo "func vfio_pci_mmap_huge_fault +p" >
/proc/dynamic_debug/control just in case before starting the vm.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (39 preceding siblings ...)
2025-05-01 9:05 ` bugzilla-daemon
@ 2025-05-01 14:42 ` bugzilla-daemon
2025-05-01 15:03 ` bugzilla-daemon
` (8 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-05-01 14:42 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #40 from Alex Williamson (alex.williamson@redhat.com) ---
The vfio_pci_mmap_huge_fault logs with order >0 and ending in 0x800 are normal,
they're indicating we can't create the huge page mapping due to alignment
requirements, 0x800 is VM_FAULT_FALLBACK (ie. fallback to a smaller mapping).
However, there are three instances of:
May 01 10:01:37 pve QEMU[1972]: error: kvm run failed Bad address
May 01 10:01:37 pve QEMU[1972]: error: kvm run failed Bad address
May 01 10:01:37 pve QEMU[1972]: error: kvm run failed Bad address
And three instances of:
May 01 10:01:37 pve kernel: vfio-pci 0000:01:00.0:
vfio_pci_mmap_huge_fault(,order = 0) BAR 1 page offset 0x3798: 0x1
May 01 10:01:37 pve kernel: vfio-pci 0000:01:00.0:
vfio_pci_mmap_huge_fault(,order = 0) BAR 1 page offset 0x3710: 0x1
May 01 10:01:37 pve kernel: vfio-pci 0000:01:00.0:
vfio_pci_mmap_huge_fault(,order = 0) BAR 1 page offset 0x3688: 0x1
0x1 is VM_FAULT_OOM, so likely at some point in trying to insert the pte, we
got an -ENOMEM.
The system has 128GB of RAM, 98GB of which is dedicated to 1G hugepages. This
VM is configured for 32GB. What happens if fewer hugepages are reserved?
Also note that if we were able to populate the MMIO mappings using huge pages,
which would occur if the VM BIOS had placed the mappings within the DMA
mappable range of the IOMMU (ie. the VFIO_MAP_DMA failures), we'd be using
fewer page table entries than even the previous code (ie. less memory). The
issue might simply come down to the fact that previously we attempted to fault
in the entire MMIO mapping on the first fault, at that time memory was
available, but now we fault on access with the expectation that we're faulting
less due to huge pages, but the latter is not coming to fruition due to the bad
VM configuration.
I think we're going to need to figure out if/how Proxmox enables setting
guest-phys-bits=39 or the host needs to free up some memory from hugepages.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (40 preceding siblings ...)
2025-05-01 14:42 ` bugzilla-daemon
@ 2025-05-01 15:03 ` bugzilla-daemon
2025-05-01 15:21 ` bugzilla-daemon
` (7 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-05-01 15:03 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #41 from Adolfo (adolfotregosa@gmail.com) ---
Created attachment 308066
--> https://bugzilla.kernel.org/attachment.cgi?id=308066&action=edit
hugepages_testing
I've done 2 tests. Reduced host hugepages reserve to 40, started VM up to
crash. Then I configured VM to not use 1G hugepages, proxmox uses transparent
hugepages in this case, and logged VM up to crash once more.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (41 preceding siblings ...)
2025-05-01 15:03 ` bugzilla-daemon
@ 2025-05-01 15:21 ` bugzilla-daemon
2025-05-01 15:28 ` bugzilla-daemon
` (6 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-05-01 15:21 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #42 from Alex Williamson (alex.williamson@redhat.com) ---
If it's not a physical memory limit (I'm not able to reproduce even allocating
60 1GB hugepages on a 64GB host), it may be that proxmox is imposing cgroup
memory limits on the VM. It still doesn't make sense to me how huge_fault
support could result in more memory used by the page tables though. The
previous behavior is effectively the worst case scenario where the full device
memory is mapped as ptes.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (42 preceding siblings ...)
2025-05-01 15:21 ` bugzilla-daemon
@ 2025-05-01 15:28 ` bugzilla-daemon
2025-05-02 16:12 ` bugzilla-daemon
` (5 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-05-01 15:28 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #43 from Adolfo (adolfotregosa@gmail.com) ---
Do you want(In reply to Alex Williamson from comment #42)
> If it's not a physical memory limit (I'm not able to reproduce even
> allocating 60 1GB hugepages on a 64GB host), it may be that proxmox is
> imposing cgroup memory limits on the VM. It still doesn't make sense to me
> how huge_fault support could result in more memory used by the page tables
> though. The previous behavior is effectively the worst case scenario where
> the full device memory is mapped as ptes.
Would you like to connect remotely to the machine? If so, I can email at
alex.williamson@redhat.com with the AnyDesk ID for a laptop that you can use to
SSH into the machine and do your magic.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (43 preceding siblings ...)
2025-05-01 15:28 ` bugzilla-daemon
@ 2025-05-02 16:12 ` bugzilla-daemon
2025-05-02 17:15 ` bugzilla-daemon
` (4 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-05-02 16:12 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #44 from Alex Williamson (alex.williamson@redhat.com) ---
Created attachment 308071
--> https://bugzilla.kernel.org/attachment.cgi?id=308071&action=edit
align faults
Please test this as a potential fix. It includes the debugging from last time,
so you'll want to unapply that first to get to a clean 6.14.4 base.
The theory here is that we might be getting a VM_FAULT_OOM due to a race rather
than an actual -ENOMEM condition, and while the mm should interpret the failure
differently and handle it, we might avoid the race and use the page table more
efficiently in this scenario if we actively align mappings to create huge pages
rather than deferring non-huge aligned faults to smaller mappings.
Please report results and also attach dmesg with logging for additional
confirmation. Thanks
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (44 preceding siblings ...)
2025-05-02 16:12 ` bugzilla-daemon
@ 2025-05-02 17:15 ` bugzilla-daemon
2025-05-02 22:52 ` bugzilla-daemon
` (3 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-05-02 17:15 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #45 from Adolfo (adolfotregosa@gmail.com) ---
Created attachment 308072
--> https://bugzilla.kernel.org/attachment.cgi?id=308072&action=edit
log with new patch
patch applied to the newly released 6.14.5 kernel. VM no longer crashes. Log
attached. There are still VFIO_MAP_DMA failed messages but I cannot make the VM
crash so far.
Thank you for not giving up.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (45 preceding siblings ...)
2025-05-02 17:15 ` bugzilla-daemon
@ 2025-05-02 22:52 ` bugzilla-daemon
2025-05-02 23:06 ` bugzilla-daemon
` (2 subsequent siblings)
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-05-02 22:52 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #46 from Alex Williamson (alex.williamson@redhat.com) ---
Thanks for the prompt testing, I've posted the fix without the debug logging
here:
https://lore.kernel.org/all/20250502224035.3183451-1-alex.williamson@redhat.com/
Pending reviews and comments, I'll try to get it in for v6.15 for backport to
stable trees.
The VFIO_MAP_DMA failures are a VM configuration error and a byproduct that
Intel ships platforms where the CPU physical address width is different from
the IOMMU address width. QEMU/vBIOS defines the MMIO layout relative to the
CPU address width, therefore the vCPU needs to reflect that address width
restriction. QEMU makes this configuration available though a guest_phys_bits
option, but it doesn't appear that Proxmox provides a way to configure this.
The result is these error logs, which indicate P2P DMA mappings are not being
created. With the fix we're pursuing above, this should not result in a
performance/efficiency loss relative to the page table use though.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (46 preceding siblings ...)
2025-05-02 22:52 ` bugzilla-daemon
@ 2025-05-02 23:06 ` bugzilla-daemon
2025-05-06 9:47 ` bugzilla-daemon
2025-05-06 9:57 ` bugzilla-daemon
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-05-02 23:06 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
Adolfo (adolfotregosa@gmail.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |PATCH_ALREADY_AVAILABLE
--- Comment #47 from Adolfo (adolfotregosa@gmail.com) ---
Thank you once more. I'll mark it as resolved.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (47 preceding siblings ...)
2025-05-02 23:06 ` bugzilla-daemon
@ 2025-05-06 9:47 ` bugzilla-daemon
2025-05-06 9:57 ` bugzilla-daemon
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-05-06 9:47 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #48 from Fabian Grünbichler (f.gruenbichler@proxmox.com) ---
also see
https://lore.kernel.org/qemu-devel/20250130134346.1754143-1-clg@redhat.com/
I filed a bug about getting guest-phys-bits exposed on the PVE side as well:
https://bugzilla.proxmox.com/show_bug.cgi?id=6378
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
` (48 preceding siblings ...)
2025-05-06 9:47 ` bugzilla-daemon
@ 2025-05-06 9:57 ` bugzilla-daemon
49 siblings, 0 replies; 51+ messages in thread
From: bugzilla-daemon @ 2025-05-06 9:57 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=220057
--- Comment #49 from Cédric Le Goater (clg@kaod.org) ---
This series was partially merged and QEMU should now report the VFIO_MAP_DMA
error only once.
Checking that the IOMMU address space width is smaller than the CPU physical
address width needs some rework. I agree it would be good to have for consumer
grade processors or when using a vIOMMU device with a 39-bit address width.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2025-05-06 9:57 UTC | newest]
Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-27 0:46 [Bug 220057] New: Kernel regression. Linux VMs crashing (I did not test Windows guest VMs) bugzilla-daemon
2025-04-27 0:48 ` [Bug 220057] " bugzilla-daemon
2025-04-27 0:50 ` bugzilla-daemon
2025-04-27 23:34 ` bugzilla-daemon
2025-04-28 0:00 ` bugzilla-daemon
2025-04-28 7:18 ` bugzilla-daemon
2025-04-28 8:25 ` bugzilla-daemon
2025-04-28 15:10 ` bugzilla-daemon
2025-04-28 19:41 ` bugzilla-daemon
2025-04-28 21:11 ` bugzilla-daemon
2025-04-28 21:22 ` bugzilla-daemon
2025-04-28 21:24 ` bugzilla-daemon
2025-04-28 21:26 ` bugzilla-daemon
2025-04-28 21:39 ` bugzilla-daemon
2025-04-28 21:42 ` bugzilla-daemon
2025-04-28 21:49 ` bugzilla-daemon
2025-04-28 22:15 ` bugzilla-daemon
2025-04-28 22:27 ` bugzilla-daemon
2025-04-28 22:29 ` bugzilla-daemon
2025-04-28 22:31 ` bugzilla-daemon
2025-04-28 22:46 ` bugzilla-daemon
2025-04-28 23:05 ` bugzilla-daemon
2025-04-29 6:54 ` bugzilla-daemon
2025-04-29 8:08 ` bugzilla-daemon
2025-04-29 8:09 ` bugzilla-daemon
2025-04-29 8:14 ` bugzilla-daemon
2025-04-29 14:58 ` bugzilla-daemon
2025-04-29 15:02 ` bugzilla-daemon
2025-04-29 15:09 ` bugzilla-daemon
2025-04-29 15:15 ` bugzilla-daemon
2025-04-29 15:18 ` bugzilla-daemon
2025-04-29 15:22 ` bugzilla-daemon
2025-04-29 15:25 ` bugzilla-daemon
2025-04-29 15:35 ` bugzilla-daemon
2025-04-29 18:36 ` bugzilla-daemon
2025-04-29 20:09 ` bugzilla-daemon
2025-04-30 0:20 ` bugzilla-daemon
2025-04-30 0:41 ` bugzilla-daemon
2025-04-30 7:32 ` bugzilla-daemon
2025-04-30 22:33 ` bugzilla-daemon
2025-05-01 9:05 ` bugzilla-daemon
2025-05-01 14:42 ` bugzilla-daemon
2025-05-01 15:03 ` bugzilla-daemon
2025-05-01 15:21 ` bugzilla-daemon
2025-05-01 15:28 ` bugzilla-daemon
2025-05-02 16:12 ` bugzilla-daemon
2025-05-02 17:15 ` bugzilla-daemon
2025-05-02 22:52 ` bugzilla-daemon
2025-05-02 23:06 ` bugzilla-daemon
2025-05-06 9:47 ` bugzilla-daemon
2025-05-06 9:57 ` bugzilla-daemon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).