* [Bug 209155] KVM Linux guest with more than 1 CPU panics after commit 5a9f54435a48... on old CPU (Phenom x4)
[not found] <bug-209155-28872@https.bugzilla.kernel.org/>
@ 2020-09-04 22:47 ` bugzilla-daemon
2020-09-08 0:31 ` [Bug 209155] KVM Linux guest with more than 1 CPU panics after commit 404d5d7bff0d419fe11c7eaebca9ec8f25258f95 " bugzilla-daemon
` (5 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2020-09-04 22:47 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=209155
--- Comment #3 from Paul K. (kronenpj@kronenpj.dyndns.org) ---
$ git bisect log
# bad: [bcf876870b95592b52519ed4aafcf9d95999bc9c] Linux 5.8
git bisect start 'v5.8'
# good: [3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162] Linux 5.7
git bisect good 3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162
# bad: [694b5a5d313f3997764b67d52bab66ec7e59e714] Merge tag 'arm-soc-5.8' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad 694b5a5d313f3997764b67d52bab66ec7e59e714
# bad: [694b5a5d313f3997764b67d52bab66ec7e59e714] Merge tag 'arm-soc-5.8' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad 694b5a5d313f3997764b67d52bab66ec7e59e714
# bad: [694b5a5d313f3997764b67d52bab66ec7e59e714] Merge tag 'arm-soc-5.8' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad 694b5a5d313f3997764b67d52bab66ec7e59e714
# bad: [694b5a5d313f3997764b67d52bab66ec7e59e714] Merge tag 'arm-soc-5.8' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad 694b5a5d313f3997764b67d52bab66ec7e59e714
# bad: [694b5a5d313f3997764b67d52bab66ec7e59e714] Merge tag 'arm-soc-5.8' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad 694b5a5d313f3997764b67d52bab66ec7e59e714
# bad: [2e63f6ce7ed2c4ff83ba30ad9ccad422289a6c63] Merge branch 'uaccess.comedi'
of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect bad 2e63f6ce7ed2c4ff83ba30ad9ccad422289a6c63
# good: [cfa3b8068b09f25037146bfd5eed041b78878bee] Merge tag 'for-linus-hmm' of
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
git bisect good cfa3b8068b09f25037146bfd5eed041b78878bee
# good: [c41219fda6e04255c44d37fd2c0d898c1c46abf1] Merge tag
'drm-intel-next-fixes-2020-05-20' of
git://anongit.freedesktop.org/drm/drm-intel into drm-next
git bisect good c41219fda6e04255c44d37fd2c0d898c1c46abf1
# good: [f3cdc8ae116e27d84e1f33c7a2995960cebb73ac] Merge tag 'for-5.8-tag' of
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
git bisect good f3cdc8ae116e27d84e1f33c7a2995960cebb73ac
# good: [f1e455352b6f503532eb3637d0a6d991895e7856] Merge tag 'kgdb-5.8-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux
git bisect good f1e455352b6f503532eb3637d0a6d991895e7856
# bad: [cb953129bfe5c0f2da835a0469930873fb7e71df] kvm: add halt-polling cpu
usage stats
git bisect bad cb953129bfe5c0f2da835a0469930873fb7e71df
# good: [3754afe7cf7cc3693a9c9ff795e9bd97175ca639] tools/kvm_stat: Add command
line switch '-L' to log to file
git bisect good 3754afe7cf7cc3693a9c9ff795e9bd97175ca639
# good: [c4e115f08c08cb9f3b70247b42323e40b9afd1fd] kvm/eventfd: remove unneeded
conversion to bool
git bisect good c4e115f08c08cb9f3b70247b42323e40b9afd1fd
# good: [5b494aea13fe9ec67365510c0d75835428cbb303] KVM: No need to retry for
hva_to_pfn_remapped()
git bisect good 5b494aea13fe9ec67365510c0d75835428cbb303
# bad: [379a3c8ee44440d5afa505230ed8cb5b0d0e314b] KVM: VMX: Optimize
posted-interrupt delivery for timer fastpath
git bisect bad 379a3c8ee44440d5afa505230ed8cb5b0d0e314b
# good: [9e826feb8f114964cbdce026340b6cb9bde68a18] KVM: nVMX: Drop superfluous
VMREAD of vmcs02.GUEST_SYSENTER_*
git bisect good 9e826feb8f114964cbdce026340b6cb9bde68a18
# good: [2c4c41325540cf3abb12aef142c0e550f6afeffc] KVM: x86: Print symbolic
names of VMX VM-Exit flags in traces
git bisect good 2c4c41325540cf3abb12aef142c0e550f6afeffc
# bad: [404d5d7bff0d419fe11c7eaebca9ec8f25258f95] KVM: X86: Introduce more
exit_fastpath_completion enum values
git bisect bad 404d5d7bff0d419fe11c7eaebca9ec8f25258f95
# good: [5a9f54435a488f8a1153efd36cccee3e7e0fc28b] KVM: X86: Introduce
kvm_vcpu_exit_request() helper
git bisect good 5a9f54435a488f8a1153efd36cccee3e7e0fc28b
# first bad commit: [404d5d7bff0d419fe11c7eaebca9ec8f25258f95] KVM: X86:
Introduce more exit_fastpath_completion enum values
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug 209155] KVM Linux guest with more than 1 CPU panics after commit 404d5d7bff0d419fe11c7eaebca9ec8f25258f95 on old CPU (Phenom x4)
[not found] <bug-209155-28872@https.bugzilla.kernel.org/>
2020-09-04 22:47 ` [Bug 209155] KVM Linux guest with more than 1 CPU panics after commit 5a9f54435a48... on old CPU (Phenom x4) bugzilla-daemon
@ 2020-09-08 0:31 ` bugzilla-daemon
2020-09-08 17:08 ` bugzilla-daemon
` (4 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2020-09-08 0:31 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=209155
Wanpeng Li (wanpeng.li@hotmail.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |wanpeng.li@hotmail.com
--- Comment #4 from Wanpeng Li (wanpeng.li@hotmail.com) ---
Could you dump the lscpu results in both the guest and the host? In addition,
could you dump the result from grep . /sys/module/kvm_amd/parameters/* ?
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug 209155] KVM Linux guest with more than 1 CPU panics after commit 404d5d7bff0d419fe11c7eaebca9ec8f25258f95 on old CPU (Phenom x4)
[not found] <bug-209155-28872@https.bugzilla.kernel.org/>
2020-09-04 22:47 ` [Bug 209155] KVM Linux guest with more than 1 CPU panics after commit 5a9f54435a48... on old CPU (Phenom x4) bugzilla-daemon
2020-09-08 0:31 ` [Bug 209155] KVM Linux guest with more than 1 CPU panics after commit 404d5d7bff0d419fe11c7eaebca9ec8f25258f95 " bugzilla-daemon
@ 2020-09-08 17:08 ` bugzilla-daemon
2020-09-09 2:35 ` bugzilla-daemon
` (3 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2020-09-08 17:08 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=209155
Sean Christopherson (sean.j.christopherson@intel.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |sean.j.christopherson@intel
| |.com
--- Comment #8 from Sean Christopherson (sean.j.christopherson@intel.com) ---
From code inspection, I'm 99% confident the immediate bug is that svm->next_rip
is reset in svm_vcpu_run() only after calling svm_exit_handlers_fastpath(),
which will cause SVM's skip_emulated_instruction() to write a stale RIP. I
don't have AMD hardware to confirm, but this should be reproducible on modern
CPUs by loading kvm_amd with nrips=0.
That issue is easy enough to resolve, e.g. simply hoist "svm->next_rip = 0;" up
above the fastpath handling. But, there are additional complications with
advancing rip in the fastpath as svm_complete_interrupts() consumes rip, e.g.
for NMI unmasking logic and event reinjection. Odds are that NMI unmasking
will never "fail" as it would require the new rip to match the last IRET rip,
which would be very bizarre. Similarly, event reinjection should also be a
non-issue in practice as the WRMSR fastpath shouldn't be reachable if KVM was
injecting an event.
All the being said, IMO, the safest play would be to first yank out the call to
handle_fastpath_set_msr_irqoff() in svm_exit_handlers_fastpath() to ensure a
clean base and to provide a safe backport patch, then move
svm_complete_interrupts() into svm_vcpu_run(), and finally move the call to
svm_exit_handlers_fastpath() down a ways and reenable
handle_fastpath_set_msr_irqoff(). Aside from resolving weirdness with rip and
fastpath, it would also align VMX and SVM with respect to completing
interrupts.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug 209155] KVM Linux guest with more than 1 CPU panics after commit 404d5d7bff0d419fe11c7eaebca9ec8f25258f95 on old CPU (Phenom x4)
[not found] <bug-209155-28872@https.bugzilla.kernel.org/>
` (2 preceding siblings ...)
2020-09-08 17:08 ` bugzilla-daemon
@ 2020-09-09 2:35 ` bugzilla-daemon
2020-09-09 16:20 ` bugzilla-daemon
` (2 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2020-09-09 2:35 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=209155
--- Comment #11 from Wanpeng Li (wanpeng.li@hotmail.com) ---
(In reply to Sean Christopherson from comment #8)
> From code inspection, I'm 99% confident the immediate bug is that
> svm->next_rip is reset in svm_vcpu_run() only after calling
> svm_exit_handlers_fastpath(), which will cause SVM's
> skip_emulated_instruction() to write a stale RIP. I don't have AMD hardware
> to confirm, but this should be reproducible on modern CPUs by loading
> kvm_amd with nrips=0.
>
> That issue is easy enough to resolve, e.g. simply hoist "svm->next_rip = 0;"
> up above the fastpath handling. But, there are additional complications
> with advancing rip in the fastpath as svm_complete_interrupts() consumes
> rip, e.g. for NMI unmasking logic and event reinjection. Odds are that NMI
> unmasking will never "fail" as it would require the new rip to match the
> last IRET rip, which would be very bizarre. Similarly, event reinjection
> should also be a non-issue in practice as the WRMSR fastpath shouldn't be
> reachable if KVM was injecting an event.
>
> All the being said, IMO, the safest play would be to first yank out the call
> to handle_fastpath_set_msr_irqoff() in svm_exit_handlers_fastpath() to
> ensure a clean base and to provide a safe backport patch, then move
> svm_complete_interrupts() into svm_vcpu_run(), and finally move the call to
> svm_exit_handlers_fastpath() down a ways and reenable
> handle_fastpath_set_msr_irqoff(). Aside from resolving weirdness with rip
> and fastpath, it would also align VMX and SVM with respect to completing
> interrupts.
Hi Sean, thanks for your analyses, I will send out patches to fix it. :)
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug 209155] KVM Linux guest with more than 1 CPU panics after commit 404d5d7bff0d419fe11c7eaebca9ec8f25258f95 on old CPU (Phenom x4)
[not found] <bug-209155-28872@https.bugzilla.kernel.org/>
` (3 preceding siblings ...)
2020-09-09 2:35 ` bugzilla-daemon
@ 2020-09-09 16:20 ` bugzilla-daemon
2020-09-10 0:14 ` bugzilla-daemon
2020-09-10 12:12 ` bugzilla-daemon
6 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2020-09-09 16:20 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=209155
--- Comment #12 from Paul K. (kronenpj@kronenpj.dyndns.org) ---
Verified fix works in both the bisected revision and v5.9-rc3. I have not tried
to apply the three patches sent to the mailing list. Should I?
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug 209155] KVM Linux guest with more than 1 CPU panics after commit 404d5d7bff0d419fe11c7eaebca9ec8f25258f95 on old CPU (Phenom x4)
[not found] <bug-209155-28872@https.bugzilla.kernel.org/>
` (4 preceding siblings ...)
2020-09-09 16:20 ` bugzilla-daemon
@ 2020-09-10 0:14 ` bugzilla-daemon
2020-09-10 12:12 ` bugzilla-daemon
6 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2020-09-10 0:14 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=209155
--- Comment #13 from Wanpeng Li (wanpeng.li@hotmail.com) ---
(In reply to Paul K. from comment #12)
> Verified fix works in both the bisected revision and v5.9-rc3. I have not
> tried to apply the three patches sent to the mailing list. Should I?
Please have a try, we test these three patches on AMD ROME with nrips=0.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug 209155] KVM Linux guest with more than 1 CPU panics after commit 404d5d7bff0d419fe11c7eaebca9ec8f25258f95 on old CPU (Phenom x4)
[not found] <bug-209155-28872@https.bugzilla.kernel.org/>
` (5 preceding siblings ...)
2020-09-10 0:14 ` bugzilla-daemon
@ 2020-09-10 12:12 ` bugzilla-daemon
6 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2020-09-10 12:12 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=209155
--- Comment #14 from Paul K. (kronenpj@kronenpj.dyndns.org) ---
I'm having a bit of a problem applying the patches cleanly. Working with both
v5.9-rc3 and 5.9-rc4 give the same:
Patch 1/3 goes fine:
$ patch -p1 < /net/phenom/export/home2/users/kronenpj/tmp/patch1-3.txt
patching file arch/x86/kvm/svm/svm.c
Patch 2/3 fails on hunk #3:
$ patch -p1 < /net/phenom/export/home2/users/kronenpj/tmp/patch2-3.txt
patching file arch/x86/kvm/svm/svm.c
Hunk #1 succeeded at 3349 (offset 2 lines).
Hunk #2 succeeded at 3504 (offset 4 lines).
Hunk #3 FAILED at 3533.
1 out of 3 hunks FAILED -- saving rejects to file arch/x86/kvm/svm/svm.c.rej
$ cat svm.c.rej
--- arch/x86/kvm/svm/svm.c
+++ arch/x86/kvm/svm/svm.c
@@ -3533,6 +3537,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu
*vcpu)
svm_handle_mce(svm);
svm_complete_interrupts(svm);
+ exit_fastpath = svm_exit_handlers_fastpath(vcpu);
vmcb_mark_all_clean(svm->vmcb);
return exit_fastpath;
Adding that line manually and continuing with the third patch:
$ patch -p1 < /net/phenom/export/home2/users/kronenpj/tmp/patch3-3.txt
patching file arch/x86/kvm/svm/svm.c
Hunk #2 succeeded at 3536 with fuzz 2 (offset 8 lines).
The patch against v5.9-rc4+ works as expected.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-09-10 12:14 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bug-209155-28872@https.bugzilla.kernel.org/>
2020-09-04 22:47 ` [Bug 209155] KVM Linux guest with more than 1 CPU panics after commit 5a9f54435a48... on old CPU (Phenom x4) bugzilla-daemon
2020-09-08 0:31 ` [Bug 209155] KVM Linux guest with more than 1 CPU panics after commit 404d5d7bff0d419fe11c7eaebca9ec8f25258f95 " bugzilla-daemon
2020-09-08 17:08 ` bugzilla-daemon
2020-09-09 2:35 ` bugzilla-daemon
2020-09-09 16:20 ` bugzilla-daemon
2020-09-10 0:14 ` bugzilla-daemon
2020-09-10 12:12 ` bugzilla-daemon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).