* [Confusing Bug] A Long-running Syzkaller Docker Crashes Host System
@ 2025-04-09 13:24 Zhiyu Zhang
2025-04-09 20:41 ` Sean Christopherson
0 siblings, 1 reply; 2+ messages in thread
From: Zhiyu Zhang @ 2025-04-09 13:24 UTC (permalink / raw)
To: seanjc, pbonzini, syzkaller, kvm, tglx
Dear Syzkaller Group and Linux Kernel Upstream,
I am writing to report an intermittent issue that appears when running
Syzkaller inside a Docker container with privileged KVM access. The
host system becomes unresponsive after prolonged fuzzing, and I hope
your insights can help identify the root cause.
Environment Details:
- Host Machine:
- OS: Ubuntu 20.04.6 LTS
- Kernel: x86_64 Linux 5.15.0-136-generic
- CPU: Intel Xeon Platinum 8268 @ 192×3.9GHz
- Docker Container:
- Base Image: Ubuntu 22.04 (qgrain/kernel-fuzz:v1)
- Syzkaller Version: commit 4121cf9 (20250217)
- Startup Command: docker run -itd -p 29400:22 -v
/PATH/KERNELS:/root/kernels --name NAME --privileged=true
qgrain/kernel-fuzz:v1
After the fuzzing instances had been running for an extended period,
the host system became completely inaccessible (e.g., SSH connections
failed). Through IPMI, I observed the following repeated log messages
on the virtual terminal:
[244053.888249] kvm [3867]: vcpu2, guest pF: 0xffffffff813008ac
vmx_set_jsr: BTF ln_inA2_DEBUGTRLSR 0x2, nop
[244053.938264] kvm [3867]: vcpu3, guest pF: 0xffffffff813008ac
vmx_set_jsr: BTF ln_inA2_DEBUGTRLSR 0x2, nop
[244053.960191] kvm [3867]: vcpu0, guest pF: 0xffffffff813008ac
vmx_set_jsr: BTF ln_inA2_DEBUGTRLSR 0x2, nop
[244053.992411] kvm [3867]: vcpu1, guest pF: 0xffffffff813008ac
vmx_set_jsr: BTF ln_inA2_DEBUGTRLSR 0x2, nop
[244075.149293] kvm [3882]: vcpu3, guest pF: 0xffffffff81300744
vmx_set_jsr: BTF ln_inA2_DEBUGTRLSR 0x2, nop
...
Speculation on Possible Causes:
- One possibility is that the long-term Syzkaller fuzzing workload has
generated test cases that trigger an edge-case bug in the host KVM
module. The repeated “guest pF” errors could indicate that a specific
sequence of guest instructions is not being handled correctly.
- Alternatively, prolonged high-load conditions from continuous
fuzzing might have exposed an unhandled kernel or hardware bug related
to virtualization—potentially in the CPU’s VMX or within the KVM
module itself.
I apologize for the limited diagnostic information available at this
time (find nothing relevant to KVM in system logs). The above
speculation is preliminary, and I am unsure whether the root cause
lies within the Syzkaller side or Kernel KVM side.
Thank you for your attention to this matter. I look forward to any
suggestions or questions you may have.
Best regards,
Zhiyu Zhang
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: [Confusing Bug] A Long-running Syzkaller Docker Crashes Host System
2025-04-09 13:24 [Confusing Bug] A Long-running Syzkaller Docker Crashes Host System Zhiyu Zhang
@ 2025-04-09 20:41 ` Sean Christopherson
0 siblings, 0 replies; 2+ messages in thread
From: Sean Christopherson @ 2025-04-09 20:41 UTC (permalink / raw)
To: Zhiyu Zhang; +Cc: pbonzini, syzkaller, kvm, tglx
On Wed, Apr 09, 2025, Zhiyu Zhang wrote:
> Dear Syzkaller Group and Linux Kernel Upstream,
>
> I am writing to report an intermittent issue that appears when running
> Syzkaller inside a Docker container with privileged KVM access. The
> host system becomes unresponsive after prolonged fuzzing, and I hope
> your insights can help identify the root cause.
>
> Environment Details:
> - Host Machine:
> - OS: Ubuntu 20.04.6 LTS
> - Kernel: x86_64 Linux 5.15.0-136-generic
> - CPU: Intel Xeon Platinum 8268 @ 192×3.9GHz
> - Docker Container:
> - Base Image: Ubuntu 22.04 (qgrain/kernel-fuzz:v1)
> - Syzkaller Version: commit 4121cf9 (20250217)
> - Startup Command: docker run -itd -p 29400:22 -v
> /PATH/KERNELS:/root/kernels --name NAME --privileged=true
> qgrain/kernel-fuzz:v1
>
> After the fuzzing instances had been running for an extended period,
> the host system became completely inaccessible (e.g., SSH connections
> failed). Through IPMI, I observed the following repeated log messages
> on the virtual terminal:
You'll likely need some way to get more information about the state of the host
kernel when things go sideways. E.g. force a crash and get a kdump. Either that,
or hope you get lucky and capture an oops/panic.
> [244053.888249] kvm [3867]: vcpu2, guest pF: 0xffffffff813008ac
> vmx_set_jsr: BTF ln_inA2_DEBUGTRLSR 0x2, nop
> [244053.938264] kvm [3867]: vcpu3, guest pF: 0xffffffff813008ac
> vmx_set_jsr: BTF ln_inA2_DEBUGTRLSR 0x2, nop
> [244053.960191] kvm [3867]: vcpu0, guest pF: 0xffffffff813008ac
> vmx_set_jsr: BTF ln_inA2_DEBUGTRLSR 0x2, nop
> [244053.992411] kvm [3867]: vcpu1, guest pF: 0xffffffff813008ac
> vmx_set_jsr: BTF ln_inA2_DEBUGTRLSR 0x2, nop
> [244075.149293] kvm [3882]: vcpu3, guest pF: 0xffffffff81300744
> vmx_set_jsr: BTF ln_inA2_DEBUGTRLSR 0x2, nop
What is producing these messages? It's not the upstream kernel. If your system
is generating gobs of logging, it's entirely possible the logging itself is
causing problems.
> ...
>
> Speculation on Possible Causes:
> - One possibility is that the long-term Syzkaller fuzzing workload has
> generated test cases that trigger an edge-case bug in the host KVM
> module. The repeated “guest pF” errors could indicate that a specific
> sequence of guest instructions is not being handled correctly.
> - Alternatively, prolonged high-load conditions from continuous
> fuzzing might have exposed an unhandled kernel or hardware bug related
> to virtualization—potentially in the CPU’s VMX or within the KVM
> module itself.
>
> I apologize for the limited diagnostic information available at this
> time (find nothing relevant to KVM in system logs). The above
> speculation is preliminary, and I am unsure whether the root cause
> lies within the Syzkaller side or Kernel KVM side.
>
> Thank you for your attention to this matter. I look forward to any
> suggestions or questions you may have.
>
> Best regards,
> Zhiyu Zhang
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-04-09 20:41 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-09 13:24 [Confusing Bug] A Long-running Syzkaller Docker Crashes Host System Zhiyu Zhang
2025-04-09 20:41 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).