From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E239AC433E2 for ; Wed, 9 Sep 2020 02:35:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A601321D46 for ; Wed, 9 Sep 2020 02:35:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728442AbgIICfV convert rfc822-to-8bit (ORCPT ); Tue, 8 Sep 2020 22:35:21 -0400 Received: from mail.kernel.org ([198.145.29.99]:47516 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726605AbgIICfV (ORCPT ); Tue, 8 Sep 2020 22:35:21 -0400 From: bugzilla-daemon@bugzilla.kernel.org To: kvm@vger.kernel.org Subject: [Bug 209155] KVM Linux guest with more than 1 CPU panics after commit 404d5d7bff0d419fe11c7eaebca9ec8f25258f95 on old CPU (Phenom x4) Date: Wed, 09 Sep 2020 02:35:19 +0000 X-Bugzilla-Reason: None X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: AssignedTo virtualization_kvm@kernel-bugs.osdl.org X-Bugzilla-Product: Virtualization X-Bugzilla-Component: kvm X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: wanpeng.li@hotmail.com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: virtualization_kvm@kernel-bugs.osdl.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT X-Bugzilla-URL: https://bugzilla.kernel.org/ Auto-Submitted: auto-generated MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org https://bugzilla.kernel.org/show_bug.cgi?id=209155 --- Comment #11 from Wanpeng Li (wanpeng.li@hotmail.com) --- (In reply to Sean Christopherson from comment #8) > From code inspection, I'm 99% confident the immediate bug is that > svm->next_rip is reset in svm_vcpu_run() only after calling > svm_exit_handlers_fastpath(), which will cause SVM's > skip_emulated_instruction() to write a stale RIP. I don't have AMD hardware > to confirm, but this should be reproducible on modern CPUs by loading > kvm_amd with nrips=0. > > That issue is easy enough to resolve, e.g. simply hoist "svm->next_rip = 0;" > up above the fastpath handling. But, there are additional complications > with advancing rip in the fastpath as svm_complete_interrupts() consumes > rip, e.g. for NMI unmasking logic and event reinjection. Odds are that NMI > unmasking will never "fail" as it would require the new rip to match the > last IRET rip, which would be very bizarre. Similarly, event reinjection > should also be a non-issue in practice as the WRMSR fastpath shouldn't be > reachable if KVM was injecting an event. > > All the being said, IMO, the safest play would be to first yank out the call > to handle_fastpath_set_msr_irqoff() in svm_exit_handlers_fastpath() to > ensure a clean base and to provide a safe backport patch, then move > svm_complete_interrupts() into svm_vcpu_run(), and finally move the call to > svm_exit_handlers_fastpath() down a ways and reenable > handle_fastpath_set_msr_irqoff(). Aside from resolving weirdness with rip > and fastpath, it would also align VMX and SVM with respect to completing > interrupts. Hi Sean, thanks for your analyses, I will send out patches to fix it. :) -- You are receiving this mail because: You are watching the assignee of the bug.