* xl create/save throwing errors
@ 2025-02-19 16:04 Petr Beneš
2025-02-19 16:53 ` Petr Beneš
2025-02-20 8:14 ` Jan Beulich
0 siblings, 2 replies; 11+ messages in thread
From: Petr Beneš @ 2025-02-19 16:04 UTC (permalink / raw)
To: Xen-devel; +Cc: Anthony PERARD, Andrew Cooper
Hello,
I have a script that's supposed to start a couple of (Windows 10) VMs
in parallel, wait until they boot and connect to the network, and then
create a live snapshot.
VMs are created by simple "xl create vm.cfg" and the live snapshot is
created by "xl save win10-18362-NNN path/to/state".
I have noticed, that "xl create" occasionally throws this line:
```
libxl: error: libxl_aoutils.c:646:libxl__kill_xs_path: qemu
command-line probe already exited
```
First I thought it's related to the fact that multiple "xl create"
commands are being run in parallel, but to my surprise, this line
sometimes occurs even for standalone "xl create" commands.
However, when "xl save" is being executed in parallel, I'm very often
met with output similar to this:
```
Saving to win10-18362-102/state new xl format (info 0x3/0x0/1780)
xc: info: Saving domain 193, type x86 HVM
Saving to win10-18362-101/state new xl format (info 0x3/0x0/1780)
xc: info: Saving domain 192, type x86 HVM
Saving to win10-18362-104/state new xl format (info 0x3/0x0/1780)
xc: info: Saving domain 194, type x86 HVM
xc: error: save callback suspend() failed: 0: Internal error
xc: error: Save failed (0 = Success): Internal error
libxl: error: libxl_stream_write.c:347:libxl__xc_domain_save_done:
Domain 192:saving domain: domain responded to suspend request: Success
Failed to save domain, resuming domain
xc: error: save callback suspend() failed: 0: Internal error
xc: error: Save failed (0 = Success): Internal error
xc: error: Dom 192 not suspended: (shutdown 4, reason 3): Internal error
libxl: error: libxl_dom_suspend.c:661:domain_resume_done: Domain
192:xc_domain_resume failed: Invalid argument
libxl: error: libxl_stream_write.c:347:libxl__xc_domain_save_done:
Domain 194:saving domain: domain responded to suspend request: Success
Failed to save domain, resuming domain
xc: error: Dom 194 not suspended: (shutdown 4, reason 3): Internal error
libxl: error: libxl_dom_suspend.c:661:domain_resume_done: Domain
194:xc_domain_resume failed: Invalid argument
xc: Frames: 1044480/1044480 100%: Frames: 52224/1044480 5%
```
Here's an output of snapshotting 4 live VMs in parallel, where 3 of
the commands failed, and left the VMs in a running state.
Note that each "xl create"/"xl save" is executed for a separate VM.
For several months, I have executed standalone "xl save" commands with
VMs of the same settings without any problems.
Note that my VMs use qcow2 images as their disks - not ZFS or LVM:
```
disk = [ 'tap:qcow2:/win10-18362-101/clone/image.qcow2,xvda,w' ]
```
where win10-18362-101/clone/image.qcow2 is created as:
```
qemu-img create -f qcow2 -F qcow2 -b
"/win10-18362-101/base/image.qcow2"
"/win10-18362-101/clone/image.qcow2"
```
Is running "xl save" in parallel not supported? Or is it an issue with
qcow2 handling?
Best,
Petr
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xl create/save throwing errors
2025-02-19 16:04 xl create/save throwing errors Petr Beneš
@ 2025-02-19 16:53 ` Petr Beneš
2025-02-19 17:23 ` Petr Beneš
2025-02-20 8:14 ` Jan Beulich
1 sibling, 1 reply; 11+ messages in thread
From: Petr Beneš @ 2025-02-19 16:53 UTC (permalink / raw)
To: Xen-devel; +Cc: Anthony PERARD, Andrew Cooper
On Wed, Feb 19, 2025 at 5:04 PM Petr Beneš <w1benny@gmail.com> wrote:
>
> Hello,
>
To add more information and observations:
I'm running Xen 4.20-rc on a MFF Dell Optiplex, CPU is i5-12500T (6
cores, 12 threads). I have allocated 8 cores for dom0. Now:
- xl saving 4 vms, each with 4 VCPUs tend to fail
- xl saving 4 vms, each with 2 VCPUs didn't fail so far
- xl saving 8 vms, each with 2 VCPUs didn't fail so far
- xl saving 12 vms, each with 2 VCPUs didn't fail either
Note that there's always enough memory for all the VMs + dom0.
Also, I have observed new error lines when xl create is being executed
in parallel:
```
libxl: error: libxl_qmp.c:1399:qmp_ev_fd_callback: Domain 89:error on
QMP socket: Connection reset by peer
libxl: error: libxl_qmp.c:1438:qmp_ev_fd_callback: Domain 89:Error
happened with the QMP connection to QEMU
```
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xl create/save throwing errors
2025-02-19 16:53 ` Petr Beneš
@ 2025-02-19 17:23 ` Petr Beneš
2025-02-19 18:08 ` Petr Beneš
0 siblings, 1 reply; 11+ messages in thread
From: Petr Beneš @ 2025-02-19 17:23 UTC (permalink / raw)
To: Xen-devel; +Cc: Anthony PERARD, Andrew Cooper
On Wed, Feb 19, 2025 at 5:53 PM Petr Beneš <w1benny@gmail.com> wrote:
>
> On Wed, Feb 19, 2025 at 5:04 PM Petr Beneš <w1benny@gmail.com> wrote:
> >
> > Hello,
> >
>
> To add more information and observations:
Even more observations. This is from a run where 4 vms (4 VCPUs each)
were being created in parallel:
```
Saving to /clones/win10-18362-102/state new xl format (info 0x3/0x0/1780)
xc: info: Saving domain 14, type x86 HVM
xc: error: save callback suspend() failed: 0: Internal error
xc: error: Save failed (0 = Success): Internal error
libxl: error: libxl_qmp.c:1334:qmp_ev_lock_aquired: Domain 14:Failed
to connect to QMP socket /var/run/xen/qmp-libxl-14: No such file or
directory
libxl: error: libxl_dom_save.c:246:switch_qemu_xen_logdirty_done:
Domain 14:logdirty switch failed (rc=-3), abandoning suspend
xc: error: Couldn't disable qemu log-dirty mode (0 = Success): Internal error
xc: error: Failed to clean up (0 = Success): Internal error
libxl: error: libxl_stream_write.c:347:libxl__xc_domain_save_done:
Domain 14:saving domain: domain responded to suspend request: Success
Failed to save domain, resuming domain
libxl: error: libxl_qmp.c:1334:qmp_ev_lock_aquired: Domain 14:Failed
to connect to QMP socket /var/run/xen/qmp-libxl-14: No such file or
directory
libxl: error: libxl_dom_suspend.c:610:dm_resume_done: Domain 14:Failed
to resume device model: rc=-3
```
But... running afterwards:
```
# xl list
Name ID Mem VCPUs State Time(s)
Domain-0 0 16384 6 r----- 475.1
win10-18362-102 17 2048 4 -b---- 30.8
```
And:
```
# lsa /var/run/xen/
total 16K
drwxr-xr-x 2 root root 160 Feb 19 17:13 .
drwxr-xr-x 36 root root 1.1K Feb 19 17:07 ..
-rw-r--r-- 1 root root 28 Feb 19 17:13 domid-history
-rw------- 1 root root 4 Feb 19 17:06 qemu-dom0.pid
srwxr-xr-x 1 root root 0 Feb 19 17:13 qmp-libxenstat-17
srwxr-xr-x 1 root root 0 Feb 19 17:13 qmp-libxl-17
-rw------- 1 root root 5 Feb 19 17:06 xenconsoled.pid
-rw-r----- 1 root root 4 Feb 19 17:06 xenstored.pid
```
The logs complain about a domain ID 14, however, the domain ID of the
win10-18362-102 is later observed to be 17.
P.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xl create/save throwing errors
2025-02-19 17:23 ` Petr Beneš
@ 2025-02-19 18:08 ` Petr Beneš
0 siblings, 0 replies; 11+ messages in thread
From: Petr Beneš @ 2025-02-19 18:08 UTC (permalink / raw)
To: Xen-devel; +Cc: Anthony PERARD, Andrew Cooper
On Wed, Feb 19, 2025 at 6:23 PM Petr Beneš <w1benny@gmail.com> wrote:
>
> On Wed, Feb 19, 2025 at 5:53 PM Petr Beneš <w1benny@gmail.com> wrote:
> >
> > On Wed, Feb 19, 2025 at 5:04 PM Petr Beneš <w1benny@gmail.com> wrote:
> > >
> > > Hello,
> > >
> >
> > To add more information and observations:
>
> Even more observations.
Next observations:
The Dom ID mismatch seems to be caused by the fact that the VM crashes
(bugcheck) and is rebooted at the time of "xl save". Therefore, its
Dom ID changes.
At this point I'm quite baffled and I am wondering, whether the "xl
save" randomly causes my VMs to crash, or whether my image.qcow2 is
corrupted.
I'll try with different images.
P.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xl create/save throwing errors
2025-02-19 16:04 xl create/save throwing errors Petr Beneš
2025-02-19 16:53 ` Petr Beneš
@ 2025-02-20 8:14 ` Jan Beulich
2025-02-25 22:59 ` Petr Beneš
1 sibling, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2025-02-20 8:14 UTC (permalink / raw)
To: Petr Beneš; +Cc: Anthony PERARD, Andrew Cooper, Xen-devel
On 19.02.2025 17:04, Petr Beneš wrote:
> Hello,
>
> I have a script that's supposed to start a couple of (Windows 10) VMs
> in parallel, wait until they boot and connect to the network, and then
> create a live snapshot.
>
> VMs are created by simple "xl create vm.cfg" and the live snapshot is
> created by "xl save win10-18362-NNN path/to/state".
>
> I have noticed, that "xl create" occasionally throws this line:
> ```
> libxl: error: libxl_aoutils.c:646:libxl__kill_xs_path: qemu
> command-line probe already exited
> ```
>
> First I thought it's related to the fact that multiple "xl create"
> commands are being run in parallel, but to my surprise, this line
> sometimes occurs even for standalone "xl create" commands.
>
> However, when "xl save" is being executed in parallel, I'm very often
> met with output similar to this:
> ```
> Saving to win10-18362-102/state new xl format (info 0x3/0x0/1780)
> xc: info: Saving domain 193, type x86 HVM
> Saving to win10-18362-101/state new xl format (info 0x3/0x0/1780)
> xc: info: Saving domain 192, type x86 HVM
> Saving to win10-18362-104/state new xl format (info 0x3/0x0/1780)
> xc: info: Saving domain 194, type x86 HVM
> xc: error: save callback suspend() failed: 0: Internal error
> xc: error: Save failed (0 = Success): Internal error
> libxl: error: libxl_stream_write.c:347:libxl__xc_domain_save_done:
> Domain 192:saving domain: domain responded to suspend request: Success
> Failed to save domain, resuming domain
> xc: error: save callback suspend() failed: 0: Internal error
> xc: error: Save failed (0 = Success): Internal error
> xc: error: Dom 192 not suspended: (shutdown 4, reason 3): Internal error
> libxl: error: libxl_dom_suspend.c:661:domain_resume_done: Domain
> 192:xc_domain_resume failed: Invalid argument
> libxl: error: libxl_stream_write.c:347:libxl__xc_domain_save_done:
> Domain 194:saving domain: domain responded to suspend request: Success
> Failed to save domain, resuming domain
> xc: error: Dom 194 not suspended: (shutdown 4, reason 3): Internal error
> libxl: error: libxl_dom_suspend.c:661:domain_resume_done: Domain
> 194:xc_domain_resume failed: Invalid argument
> xc: Frames: 1044480/1044480 100%: Frames: 52224/1044480 5%
> ```
Just one thing - to (hopefully) get a better understanding of the origin of
those errors, you may want to increase verbosity of the "xl save", e.g.
"xl -vvv save".
Jan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xl create/save throwing errors
2025-02-20 8:14 ` Jan Beulich
@ 2025-02-25 22:59 ` Petr Beneš
2025-02-26 0:49 ` Marek Marczykowski-Górecki
0 siblings, 1 reply; 11+ messages in thread
From: Petr Beneš @ 2025-02-25 22:59 UTC (permalink / raw)
To: Jan Beulich; +Cc: Anthony PERARD, Andrew Cooper, Xen-devel
On Thu, Feb 20, 2025 at 9:14 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> Just one thing - to (hopefully) get a better understanding of the origin of
> those errors, you may want to increase verbosity of the "xl save", e.g.
> "xl -vvv save".
>
> Jan
Here's an output of this command, that failed:
xl -vvv save win10-18362-103 /opt/ramdisk/vms/clones/win10-18362-103/state
libxl: debug: libxl_domain.c:2295:libxl_retrieve_domain_configuration:
Domain 90:ao 0x555eef6f1bd0: create: how=(nil) callback=(nil)
poller=0x555eef6fca50
libxl: debug: libxl_domain.c:2311:libxl_retrieve_domain_configuration:
Domain 90:ao 0x555eef6f1bd0: inprogress: poller=0x555eef6fca50,
flags=i
libxl: debug: libxl_qmp.c:1884:libxl__ev_qmp_send: Domain 90: ev
0x555eef6feeb0, cmd 'query-cpus-fast'
libxl: debug: libxl_qmp.c:1324:qmp_ev_lock_aquired: Domain
90:Connecting to /var/run/xen/qmp-libxl-90
libxl: debug: libxl_qmp.c:1699:qmp_ev_handle_message: Domain 90:QEMU
version: 8.0.4
libxl: debug: libxl_domain.c:2587:retrieve_domain_configuration_end:
Domain 90:No vtpm from xenstore
libxl: debug: libxl_domain.c:2587:retrieve_domain_configuration_end:
Domain 90:No vusb from xenstore
libxl: debug: libxl_domain.c:2587:retrieve_domain_configuration_end:
Domain 90:No vusb from xenstore
libxl: debug: libxl_domain.c:2587:retrieve_domain_configuration_end:
Domain 90:No pci from xenstore
libxl: debug: libxl_domain.c:2587:retrieve_domain_configuration_end:
Domain 90:No vdispl from xenstore
libxl: debug: libxl_domain.c:2587:retrieve_domain_configuration_end:
Domain 90:No vsnd from xenstore
libxl: debug: libxl_qmp.c:1920:libxl__ev_qmp_dispose: Domain 90: ev
0x555eef6feeb0
libxl: debug: libxl_event.c:2067:libxl__ao_complete: ao
0x555eef6f1bd0: complete, rc=0
libxl: debug: libxl_event.c:2036:libxl__ao__destroy: ao 0x555eef6f1bd0: destroy
Saving to /opt/ramdisk/vms/clones/win10-18362-103/state new xl format
(info 0x3/0x0/1793)
libxl: debug: libxl_domain.c:508:libxl_domain_suspend: Domain 90:ao
0x555eef6f66d0: create: how=(nil) callback=(nil) poller=0x555eef6fca50
libxl: debug: libxl.c:721:libxl__fd_flags_modify_save: fnctl F_GETFL
flags for fd 14 are 0x8001
libxl: debug: libxl.c:729:libxl__fd_flags_modify_save: fnctl F_SETFL
of fd 14 to 0x8001
libxl: debug: libxl_domain.c:536:libxl_domain_suspend: Domain 90:ao
0x555eef6f66d0: inprogress: poller=0x555eef6fca50, flags=i
libxl-save-helper: debug: starting save: Success
xc: detail: fd 14, dom 90, flags 0, hvm 1
xc: info: Saving domain 90, type x86 HVM
libxl: debug: libxl_qmp.c:1884:libxl__ev_qmp_send: Domain 90: ev
0x555eef703608, cmd 'xen-set-global-dirty-log'
libxl: debug: libxl_qmp.c:1324:qmp_ev_lock_aquired: Domain
90:Connecting to /var/run/xen/qmp-libxl-90
libxl: debug: libxl_qmp.c:1699:qmp_ev_handle_message: Domain 90:QEMU
version: 8.0.4
libxl: debug: libxl_event.c:863:libxl__ev_xswatch_deregister: watch
w=0x555eef703590: deregister unregistered
libxl: debug: libxl_qmp.c:1920:libxl__ev_qmp_dispose: Domain 90: ev
0x555eef703608
libxl: debug: libxl_dom_suspend.c:190:domain_suspend_callback_common:
Domain 90:Calling xc_domain_shutdown on HVM domain
libxl: debug: libxl_dom_suspend.c:300:domain_suspend_common_wait_guest:
Domain 90:wait for the guest to suspend
libxl: debug: libxl_event.c:812:libxl__ev_xswatch_register: watch
w=0x555eef702bf0 wpath=@releaseDomain token=3/0: register slotnum=3
libxl: debug: libxl_event.c:750:watchfd_callback: watch
w=0x555eef702bf0 wpath=@releaseDomain token=3/0: event
epath=@releaseDomain
libxl: debug: libxl_dom_suspend.c:348:suspend_common_wait_guest_check:
Domain 90:guest we were suspending has shut down with unexpected
reason code 3
libxl: debug: libxl_event.c:863:libxl__ev_xswatch_deregister: watch
w=0x555eef702bd8: deregister unregistered
libxl: debug: libxl_event.c:849:libxl__ev_xswatch_deregister: watch
w=0x555eef702bf0 wpath=@releaseDomain token=3/0: deregister slotnum=3
libxl: debug: libxl_qmp.c:1920:libxl__ev_qmp_dispose: ev 0x555eef702c68
xc: error: save callback suspend() failed: 0: Internal error
xc: error: Save failed (0 = Success): Internal error
libxl: debug: libxl_qmp.c:1884:libxl__ev_qmp_send: Domain 90: ev
0x555eef703608, cmd 'xen-set-global-dirty-log'
libxl: debug: libxl_qmp.c:1324:qmp_ev_lock_aquired: Domain
90:Connecting to /var/run/xen/qmp-libxl-90
libxl: debug: libxl_qmp.c:1699:qmp_ev_handle_message: Domain 90:QEMU
version: 8.0.4
libxl: debug: libxl_event.c:863:libxl__ev_xswatch_deregister: watch
w=0x555eef703590: deregister unregistered
libxl: debug: libxl_qmp.c:1920:libxl__ev_qmp_dispose: Domain 90: ev
0x555eef703608
libxl-save-helper: debug: complete r=-1: Success
libxl: error: libxl_stream_write.c:347:libxl__xc_domain_save_done:
Domain 90:saving domain: domain responded to suspend request: Success
libxl: debug: libxl.c:748:libxl__fd_flags_restore: fnctl F_SETFL of fd
14 to 0x8001
libxl: debug: libxl_event.c:2067:libxl__ao_complete: ao
0x555eef6f66d0: complete, rc=-3
libxl: debug: libxl_event.c:2036:libxl__ao__destroy: ao 0x555eef6f66d0: destroy
Failed to save domain, resuming domain
libxl: debug: libxl_domain.c:184:libxl_domain_resume: Domain 90:ao
0x555eef702050: create: how=(nil) callback=(nil) poller=0x555eef6fca50
libxl: debug: libxl_qmp.c:1884:libxl__ev_qmp_send: Domain 90: ev
0x555eef6ff208, cmd 'cont'
libxl: debug: libxl_domain.c:192:libxl_domain_resume: Domain 90:ao
0x555eef702050: inprogress: poller=0x555eef6fca50, flags=i
libxl: debug: libxl_qmp.c:1324:qmp_ev_lock_aquired: Domain
90:Connecting to /var/run/xen/qmp-libxl-90
libxl: debug: libxl_qmp.c:1699:qmp_ev_handle_message: Domain 90:QEMU
version: 8.0.4
libxl: debug: libxl_qmp.c:1920:libxl__ev_qmp_dispose: Domain 90: ev
0x555eef6ff208
libxl: debug: libxl_event.c:863:libxl__ev_xswatch_deregister: watch
w=0x555eef6ff378: deregister unregistered
xc: error: Dom 90 not suspended: (shutdown 4, reason 3): Internal error
libxl: error: libxl_dom_suspend.c:661:domain_resume_done: Domain
90:xc_domain_resume failed: Invalid argument
libxl: debug: libxl_event.c:2067:libxl__ao_complete: ao
0x555eef702050: complete, rc=-3
libxl: debug: libxl_event.c:2036:libxl__ao__destroy: ao 0x555eef702050: destroy
xencall:buffer: debug: total allocations:69 total releases:69
xencall:buffer: debug: current allocations:0 maximum allocations:2
xencall:buffer: debug: cache current size:2
xencall:buffer: debug: cache hits:51 misses:2 toobig:16
xencall:buffer: debug: total allocations:0 total releases:0
xencall:buffer: debug: current allocations:0 maximum allocations:0
xencall:buffer: debug: cache current size:0
xencall:buffer: debug: cache hits:0 misses:0 toobig:0
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xl create/save throwing errors
2025-02-25 22:59 ` Petr Beneš
@ 2025-02-26 0:49 ` Marek Marczykowski-Górecki
2025-02-26 2:29 ` Petr Beneš
0 siblings, 1 reply; 11+ messages in thread
From: Marek Marczykowski-Górecki @ 2025-02-26 0:49 UTC (permalink / raw)
To: Petr Beneš; +Cc: Jan Beulich, Anthony PERARD, Andrew Cooper, Xen-devel
[-- Attachment #1: Type: text/plain, Size: 846 bytes --]
On Tue, Feb 25, 2025 at 11:59:38PM +0100, Petr Beneš wrote:
> On Thu, Feb 20, 2025 at 9:14 AM Jan Beulich <jbeulich@suse.com> wrote:
> >
> > Just one thing - to (hopefully) get a better understanding of the origin of
> > those errors, you may want to increase verbosity of the "xl save", e.g.
> > "xl -vvv save".
> >
> > Jan
>
> Here's an output of this command, that failed:
> xl -vvv save win10-18362-103 /opt/ramdisk/vms/clones/win10-18362-103/state
>
> libxl: debug: libxl_dom_suspend.c:348:suspend_common_wait_guest_check:
> Domain 90:guest we were suspending has shut down with unexpected
> reason code 3
This is domain crash.
Anything interesting on the console log of that domain (if it has some
debug logs there...), or maybe in xl dmesg?
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xl create/save throwing errors
2025-02-26 0:49 ` Marek Marczykowski-Górecki
@ 2025-02-26 2:29 ` Petr Beneš
2025-02-26 3:23 ` Petr Beneš
0 siblings, 1 reply; 11+ messages in thread
From: Petr Beneš @ 2025-02-26 2:29 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Jan Beulich, Anthony PERARD, Andrew Cooper, Xen-devel
On Wed, Feb 26, 2025 at 1:50 AM Marek Marczykowski-Górecki
<marmarek@invisiblethingslab.com> wrote:
>
> This is domain crash.
> Anything interesting on the console log of that domain (if it has some
> debug logs there...), or maybe in xl dmesg?
>
> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
I figured. The domain simply crashes (bugchecks) and frustratingly,
the generated MEMORY.DMP is corrupted.
xl dmesg shows:
(XEN) d157: VIRIDIAN GUEST_OS_ID: vendor: 0x1 os: 0x4 major: 0xa
minor: 0 sp: 0 build: 0x271b
(XEN) d157: VIRIDIAN HYPERCALL: enabled: 1 pfn: 0x20e
(XEN) d157v0: VIRIDIAN VP_ASSIST: pfn: 0xc
(XEN) d157: VIRIDIAN HVCALL_NOTIFY_LONG_SPIN_WAIT
(XEN) d157: VIRIDIAN MSR_TIME_REF_COUNT: accessed
(XEN) d157v1: VIRIDIAN VP_ASSIST: pfn: 0x3ffff
(XEN) d157v2: VIRIDIAN VP_ASSIST: pfn: 0x3fffe
(XEN) d157v3: VIRIDIAN VP_ASSIST: pfn: 0x3fffd
(XEN) arch/x86/hvm/irq.c:368: Dom157 PCI link 0 changed 5 -> 0
(XEN) arch/x86/hvm/irq.c:368: Dom157 PCI link 1 changed 10 -> 0
(XEN) arch/x86/hvm/irq.c:368: Dom157 PCI link 2 changed 11 -> 0
(XEN) arch/x86/hvm/irq.c:368: Dom157 PCI link 3 changed 5 -> 0
(XEN) arch/x86/hvm/vmx/vmx.c:3413:d157v0 RDMSR 0x0000019a unimplemented
(XEN) arch/x86/hvm/vmx/vmx.c:3413:d157v0 RDMSR 0x0000019b unimplemented
(XEN) arch/x86/hvm/vmx/vmx.c:3413:d157v2 RDMSR 0x0000019a unimplemented
(XEN) arch/x86/hvm/vmx/vmx.c:3413:d157v2 RDMSR 0x0000019b unimplemented
(XEN) d157v3 VIRIDIAN GUEST_CRASH: 0xa 0xffffffffffffffff 0xe 0
0xfffff80648bbd2b3
So... it just confirms the bugcheck.
P.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xl create/save throwing errors
2025-02-26 2:29 ` Petr Beneš
@ 2025-02-26 3:23 ` Petr Beneš
2025-02-26 3:46 ` Petr Beneš
0 siblings, 1 reply; 11+ messages in thread
From: Petr Beneš @ 2025-02-26 3:23 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Jan Beulich, Anthony PERARD, Andrew Cooper, Xen-devel
On Wed, Feb 26, 2025 at 3:29 AM Petr Beneš <w1benny@gmail.com> wrote:
>
> and frustratingly, the generated MEMORY.DMP is corrupted.
>
I finally managed to capture a few non-corrupted crashdumps.
The cause of crash always points to the same symbol:
nt!KiIpiProcessRequests+0x193
Crashdump#1
00 fffff802`0867ad90 : 00000000`00000061 fffff307`eb40d3f0
00000000`00000000 00000000`00000000 : nt!KiIpiProcessRequests+0x193
01 fffff802`0867aaa7 : 00000000`00000000 00000000`00000000
00000000`00000000 00000000`00000000 :
nt!KiIpiInterruptSubDispatch+0x90
02 fffff802`08566c8e : 00000000`00006000 ffffd801`78e20180
00000000`00000000 00000000`00000000 : nt!KiIpiInterrupt+0x307
03 fffff802`0855a96c : 00000000`00000000 00000000`0000609c
00000000`00000000 00000000`00000000 : nt!MiFlushTbList+0x39e
04 fffff802`0855a304 : 00000000`00000000 00000000`00000000
00000000`00000003 00179800`0000609a : nt!MiReplenishBitMap+0x5bc
05 fffff802`084d3857 : 00179841`0000609b 00000000`00000001
00000000`00000020 00000000`00000000 : nt!MiEmptyPteBins+0x124
06 fffff802`084d2d1c : 00000000`00000000 ffffd66c`00000003
ffff940f`390d7d10 fffff802`084fec14 : nt!MiReservePtes+0x447
07 fffff802`0a8b45b8 : 00000000`00015000 ffff940f`361ea3e0
00000000`00000001 00000000`00000001 :
nt!MmMapLockedPagesSpecifyCache+0xcc
08 fffff802`0a8b05df : 00000000`00015000 00000000`0000100c
ffff940f`00015000 ffff940f`361ea050 :
rdyboost!SMKM_STORE<SMD_TRAITS>::SmStMapPhysicalRegion+0x80
09 fffff802`0a8b0327 : a8d26432`0000100c 00000000`00000000
00000000`00000000 ffff940f`3addd650 :
rdyboost!ST_STORE<SMD_TRAITS>::StDmpSinglePageRetrieve+0x22f
0a fffff802`0a8b0066 : ffff940f`361ea000 fffff802`0a8ae3ff
00000000`00000000 00000000`ffffffff :
rdyboost!ST_STORE<SMD_TRAITS>::StDmPageRetrieve+0x147
0b fffff802`0a8ae1ee : 00000000`00000080 ffff940f`3addd650
00000000`00000000 00000000`00000000 :
rdyboost!ST_STORE<SMD_TRAITS>::StWorkItemProcess+0xa6
0c fffff802`0a8b5be1 : 00000000`00000000 ffffd801`00000000
00000000`00000000 00000000`000001de :
rdyboost!SMKM_STORE<SMD_TRAITS>::SmStWorker+0x15e
0d fffff802`085dd715 : ffff940f`361ea000 fffff802`0a8b5bd0
fffff307`eb005f38 0000247f`b19bbdff :
rdyboost!SMKM_STORE<SMD_TRAITS>::SmStWorkerThread+0x11
0e fffff802`0867b6ea : ffffd801`78e20180 ffff940f`361d0040
fffff802`085dd6c0 00000000`00000000 : nt!PspSystemThreadStartup+0x55
0f 00000000`00000000 : fffff307`eb40e000 fffff307`eb408000
00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x2a
Crashdump#2
00 fffff800`03476d90 : 00000000`00000000 fffff800`05a75db0
00000000`00000000 00000000`00000000 : nt!KiIpiProcessRequests+0x193
01 fffff800`03476aa7 : 00000000`00000000 00000000`00000000
00000000`00000000 00000000`00000000 :
nt!KiIpiInterruptSubDispatch+0x90
02 fffff800`068e1749 : fffff800`024e7180 fffff800`033aa251
00000000`00000000 00000000`00000005 : nt!KiIpiInterrupt+0x307
03 fffff800`05140713 : ffffc787`47d447e0 fffff800`05a75f30
ffff9880`b57716f0 ffff9880`b5179180 : Rtnic64!MPIsr+0x41
04 fffff800`032b19e5 : ffff9880`b5771640 00000000`00000000
fffff800`024e7180 fffff800`024e7180 : ndis!ndisMIsr+0x83
05 fffff800`034718bf : fffff800`05a6ea90 ffff9880`b5771640
00000000`0000fffe fffff800`03476d90 :
nt!KiCallInterruptServiceRoutine+0xa5
06 fffff800`03471b87 : ffff9880`b5b7f000 ffffc787`47d1e1a0
fffff800`05185050 00001f80`00350ac0 : nt!KiInterruptSubDispatch+0x11f
07 fffff800`0347130b : ffffc787`47d1e1a0 fffff800`068e192e
ffffc787`47d447e0 00000000`00000000 : nt!KiInterruptDispatch+0x37
08 fffff800`0514063a : ffffc787`47751de0 fffff800`068e2ea8
ffffc787`47d44000 ffff078b`23b4f299 : nt!KeSynchronizeExecution+0x5b
09 fffff800`05140208 : ffffc787`47d1e1a0 fffff800`05a6ed40
ffffc787`47d44808 ffffc787`47d447e0 : ndis!ndisMDpcX+0xde
0a fffff800`0331a065 : fffff800`024e9f80 00000000`00000008
fffff800`05a6ecd0 00000000`00000008 : ndis!ndis5InterruptDpc+0x98
0b fffff800`033196bf : 00000000`00000014 00000000`00989680
00000000`0000038a 00000000`000000a2 : nt!KiExecuteAllDpcs+0x305
0c fffff800`034770e5 : 00000000`00000000 fffff800`024e7180
ffff9880`b5771640 000000d6`5dffc510 : nt!KiRetireDpcList+0x1ef
0d fffff800`03476ed0 : 00000000`00000000 fffff800`0320f2cb
ffffffff`0000ffff 00000000`00000000 : nt!KxRetireDpcList+0x5
0e fffff800`03476785 : 000000d6`5dffc510 fffff800`03471c01
00000000`00000000 fffffc83`95752780 : nt!KiDispatchInterruptContinue
0f fffff800`03471c01 : 00000000`00000000 fffffc83`95752780
ffff9880`b5771640 fffff800`0387d507 : nt!KiDpcInterruptBypass+0x25
10 fffff800`038a387b : 00000000`00000000 00000000`00000008
00000000`00000008 fffff800`032f4e97 : nt!KiInterruptDispatch+0xb1
11 fffff800`03481915 : ffffffff`fffffffb ffffc787`4cf60040
ffffdc03`00000001 000000d6`5dffcee8 : nt!NtQueryKey+0x34b
12 00007ffc`6dcfc394 : 00007ffc`6b88aad7 000000d6`5dffc630
000000d6`5dffc630 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25
13 00007ffc`6b88aad7 : 000000d6`5dffc630 000000d6`5dffc630
00000000`00000000 0000025c`00000001 : ntdll!NtQueryKey+0x14
Also, I would like to reiterate that these crashes happen AT THE VERY
MOMENT the xl save command is executed. I experimented with delaying
the xl save by a few seconds, even minutes. The VM runs always fine
until the moment xl save is executed. Then this crash happens
(randomly).
P.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xl create/save throwing errors
2025-02-26 3:23 ` Petr Beneš
@ 2025-02-26 3:46 ` Petr Beneš
2025-02-26 4:10 ` Petr Beneš
0 siblings, 1 reply; 11+ messages in thread
From: Petr Beneš @ 2025-02-26 3:46 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Jan Beulich, Anthony PERARD, Andrew Cooper, Xen-devel
On Wed, Feb 26, 2025 at 4:23 AM Petr Beneš <w1benny@gmail.com> wrote:
> I finally managed to capture a few non-corrupted crashdumps.
> The cause of crash always points to the same symbol:
> nt!KiIpiProcessRequests+0x193
It appears that the Windows likes to manage its own IPI - i.e.
KiIpiSendRequest stores the request packet to the
KPRCB->RequestMailbox, and then KiIpiProcessRequests takes that
request from the RequestMailbox.
If someone externally interferes with that (Xen?) and triggers IPI
that Windows doesn't expect, then Windows crashes - likely because it
takes some invalid/stale value from the RequestMailbox (which wasn't
set properly by KiIpiSendRequest).
This is just a wild guess and it might be wrong. But clearly,
something weird is happening around IPI during the xl save process
that Windows doesn't like.
P.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: xl create/save throwing errors
2025-02-26 3:46 ` Petr Beneš
@ 2025-02-26 4:10 ` Petr Beneš
0 siblings, 0 replies; 11+ messages in thread
From: Petr Beneš @ 2025-02-26 4:10 UTC (permalink / raw)
To: Marek Marczykowski-Górecki
Cc: Jan Beulich, Anthony PERARD, Andrew Cooper, Xen-devel
On Wed, Feb 26, 2025 at 4:46 AM Petr Beneš <w1benny@gmail.com> wrote:
>
> This is just a wild guess and it might be wrong. But clearly,
> something weird is happening around IPI during the xl save process
> that Windows doesn't like.
>
After carefully examining the crashdumps I have finally found the issue.
I looked at the stacks of the other cores and in both dumps I found
Shark.sys performing a call to KeIpiGenericCall.
Then it hit me - a long time ago I installed to the VM a tool to
defuse the PatchGuard - due to some shenanigans with a development of
an unrelated driver in the past.
I remembered that the tool installed a "Shark" driver -
https://github.com/9176324/Shark
After removing it from the VM, the "xl save" no longer causes problems.
Xen is innocent.
With that said, I'm still seeing errors during "xl create" I have
mentioned at the beginning of this mail chain, e.g.
libxl: error: libxl_aoutils.c:646:libxl__kill_xs_path: qemu
command-line probe already exited
They seem benign - they don't appear to disrupt anything. The VM is
created normally. But I have no idea why they show up.
P.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-02-26 4:11 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-19 16:04 xl create/save throwing errors Petr Beneš
2025-02-19 16:53 ` Petr Beneš
2025-02-19 17:23 ` Petr Beneš
2025-02-19 18:08 ` Petr Beneš
2025-02-20 8:14 ` Jan Beulich
2025-02-25 22:59 ` Petr Beneš
2025-02-26 0:49 ` Marek Marczykowski-Górecki
2025-02-26 2:29 ` Petr Beneš
2025-02-26 3:23 ` Petr Beneš
2025-02-26 3:46 ` Petr Beneš
2025-02-26 4:10 ` Petr Beneš
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.