From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ADC543B8D70 for ; Thu, 15 Jan 2026 16:39:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768495194; cv=none; b=s5b/xFXVnyxR4pIx+/ET3gfrrhMhK7HAvMzXgNy+8/odCg408iNLt3ib1c4f/zqazQv8dJUiU+zvLq37LNKL+j2frd68vsR15YOJMlwHN5hYzhJM4eA7VA7u8lnuIoYhgKIfAuuOAPlWSQP1aOtZGuJHC1SR+niURvuSg4K+kXE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768495194; c=relaxed/simple; bh=1iriDoFQECqI5gx+5VUYCNs/AX7t++V1xszroDtDac4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=j9qhdIth/vA6U70++u4mSai0ULYnR+AomXvS0ysxV01slOZb6EruUtFipJnSdL61oBwD1fGGK4fvWgL/UCiIKw31nxaEnIcoPCxckOktroTL619lB5AH9K6mgmkGZ+ATURG/Rvw6MaP2fxW2x4Yhgn9n4/XoG8+A+UvG9Ihlcpo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=4x7Yf8OW; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4x7Yf8OW" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-34c904a1168so997021a91.1 for ; Thu, 15 Jan 2026 08:39:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1768495193; x=1769099993; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=4gTVZDa8gi4A4LKtgeOaQbZo1IPrByh19hL2H//zuzQ=; b=4x7Yf8OWDJQr3mTO5E0fjuvGEDKmR2RRMgHd/jvZE0Jb44ns79hj+M3DUtyp0n92gF ThiE762Ib9GC7sAajVeh28tjk9h41jFPk/3yfsi/BxM3EoimrXGzCf64WXWWxwfEB8zb zyUGlaULgh6jjk8qjyQ//z/+hwM41v3eRgeAg+LxUhVQzGkC7PekVeLdhc6Dh0Snkdq9 PyY3jLEml3QQWVh8R7qCRsvzEf5oR9pJI6A93Rl5Tv9pjUsxnO8NoCoxXx4bDU86SoIO 2vGk7hZIuse2AWxHMM7mPkO/xeTnnrijICEACImkrV9QNI7mXi26uGWMLBcu8ovk6J/O aVQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768495193; x=1769099993; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4gTVZDa8gi4A4LKtgeOaQbZo1IPrByh19hL2H//zuzQ=; b=BowUdr3haBgYBLm8wA/QmnrHcpR9XFJ7Eh231zK7Waw6J11Y7rKpfkxTs0Q1BCFEmR VCIu9wXtFdeNV5Wmvv8KhN5S3xH2xP9VnHAt1QJ3rZHMgLdz17FbWA5a5RXB04f7shAu 68cM0oc/G7Fbai1ErU6Nv21ySd1nTY0ygLHNKDC2cilWwB3mlv9E1bHt9FysIraIG1rE xAFAJPnS2impL4VF8YlM/139/KMVCOPOGXQSyFwvZBugSIwtIrZpvZZdSLkGyj5cl+do GPdTGj2foN/HYq7bllVxO0RKlYhbOn+NH7+odZjfNsXgz4vzpnt2JwcOvtwzIf0uHJJ+ 5TnA== X-Forwarded-Encrypted: i=1; AJvYcCUfOc4RsuzXLI2pVLhn1emfI37uWZKSSA4T3PfNgb2JWq9MP/b4bt3e8amFS2gyg2zWv+LSu885jut61+E=@vger.kernel.org X-Gm-Message-State: AOJu0YyumTxHBedA4wFQsEcwjkiIcTd+fC0iojupiiFyGWS2dDhQRH4Y aImi1juW4XgO98ypFagO8CbkSuZQ3dh6Z6pqbxcbvp/8w3V3I8nZ+RR+9jQuvTxnmBZOCK82dwa 0VXgfRg== X-Received: from pjbqx16.prod.google.com ([2002:a17:90b:3e50:b0:34f:96fa:45e]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4b42:b0:330:6d2f:1b5d with SMTP id 98e67ed59e1d1-35272f9baa0mr72a91.26.1768495192650; Thu, 15 Jan 2026 08:39:52 -0800 (PST) Date: Thu, 15 Jan 2026 08:39:51 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260101090516.316883-1-pbonzini@redhat.com> <20260115122204.GDaWjb7Npp80GK-mFn@fat_crate.local> Message-ID: Subject: Re: [PATCH v2 0/4] x86, fpu/kvm: fix crash with AMX From: Sean Christopherson To: Paolo Bonzini Cc: Borislav Petkov , "Kernel Mailing List, Linux" , kvm , "the arch/x86 maintainers" Content-Type: text/plain; charset="us-ascii" On Thu, Jan 15, 2026, Paolo Bonzini wrote: > Il gio 15 gen 2026, 13:22 Borislav Petkov ha scritto: > > > > On Thu, Jan 01, 2026 at 10:05:12AM +0100, Paolo Bonzini wrote: > > > Fix a possible host panic, due to an unexpected #NM, when a KVM guest > > > is using AMX features. > > > > > > The guest's XFD value, which is stored in fpstate->xfd, is used for both > > > guest execution and host XSAVE operations. > > > > This already sounds weird. Why? > > Because the state of disabled components is undefined anyway. There's > no point in making all host XSAVEs more expensive, even when the TMM > registers aren't in use by the guest (which is going to be most of the > time, likely). > > > Why don't we carry separate XFD copies - guest and host - which we use for the > > guest and the host, respectively? > > That was exactly what I did in v1, but it's more code and less efficient too. And creates a weird ABI for KVM: : This also creates a nasty, subtle asymmetry in KVM's ABI. Notably, the comment : above is wrong. XSAVE does NOT run with fpstate->xfd, it runs with whatever : happens to be in hardware. For non-guest tasks, fpstate->xfd is guaranteed to : be resident in hardware when save_fpregs_to_fpstate() runs, but for guest tasks, : it will usually be the _guest's_ value. So in the common case, KVM_GET_XSAVE2 : would not return the same data set by KVM_SET_XSAVE. : : In theory we could ensure KVM saved exactly what is resident in hardware, but : that's quite tricky (and costly!) as it would require doing xfd_update_state() : before _every_ save_fpregs_to_fpstate(), e.g. not just in fpu_swap_kvm_fpstate(). : E.g. if the host kernel used the FPU from IRQ context (spoiler alert!), then KVM : wouldn't have a chance to swap in the maximal XFD[18]=0 value (i.e. the userspace : task's XFD). And IMO papered over the true bug, which is that the xstate snapshot can become inconsistent relative to KVM's tracking of guest XFD: : Lastly, the fix is effectively papering over another bug, which I'm pretty sure : is the underlying issue that was originally encountered. Assuming QEMU doesn't : intercept MSR_IA32_XFD for its own purposes, the only sequence I've come up with : that would result in KVM trying to load XTILE data with XFD[18]=1, without a : colluding userspace VMM (Paolo's selftest) is: : : 1. vCPU loads non-init XTILE data without ever setting XFD to a non-zero value : (KVM only disables XFD interception on writes with a non-zero value). : 2. Guest executes WRMSR(MSR_IA32_XFD) to set XFD[18] = 1 : 3. VM-Exit due to the WRMSR : 4. Host IRQ arrives and triggers kernel_fpu_begin() : 5. save_fpregs_to_fpstate() saves guest FPU with XFD[18]=0 : 6. fpu_update_guest_xfd() stuffs guest_fpu->fpstate->xfd = XFD[18]=1 : 7. vcpu_enter_guest() attempts to load XTILE data with XFD[18]=1 : : Note! There's no KVM_SET_XSAVE2 in the above, i.e. this doesn't require userspace : to trigger save/restore for live migration or whatever, the only timing condition : is the arrival of an IRQ that uses kernel FPU during the XFD 0=>1 VM-Exit. https://lore.kernel.org/all/aVMEcaZD_SzKzRvr@google.com