From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDCA92FB2 for ; Wed, 29 Jan 2025 14:00:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738159233; cv=none; b=F05K76tLzIZWlnxaTDKGjDsCQa1pfl6rUj+OLNYlZhIQcE/dgVqMazteJMAhD+04X7cdr08+CiECN01saV4pyz6NXth/gZj3mIzKWPB7G2sTOdcOaQkwZjePcgR1PvJttnQREzy31h8t8JvtSrIpCA7WXYBiNqiPSbkQ204qfDk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738159233; c=relaxed/simple; bh=SKjklppdrfM1nwM08QcRKNMYRZIF70k2hqxZXCQwwMc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=gJpBwktH1D59geAOQWKzDkyH2NVcCf/j6erJv4o+YQwp86FlGre1w2FlnX4eWBw65+wSFDkNQAZN+YBFNwfa6zxLd1Tdr+FpeLVSzonlI5xHttCsQkxVsqMOoiRh2fOx7aS5vQZ2OrNeicbo0l06zK/MDUWpm1IWgNfY7xQ+dk4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=3G0PWojz; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="3G0PWojz" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2166a1a5cc4so130542275ad.3 for ; Wed, 29 Jan 2025 06:00:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738159231; x=1738764031; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=krsz5uamJHPlkL96BkWXF4wcHUyP/7a62Ubu2Um2mJA=; b=3G0PWojzYhbLEYFE6wJEp0pXNwEyueKcpYcMStuLTgdrTYTmi+UoIa7GK3C3c4L1Jo AcJbY1xLiNBDX9lwJcV1VsMpfo0wpdByShrVPn4wBKLgF5LXE48fULXyc59ufJd0xTJO UjIxEU/LPz9t4M0i/uFUeHXl/c7AmDBIQ6UoKStYWUVg+cn0oP5si+Gf6MsUbNymkItC iB8Gp0s8ez6QF4Im/HrvtilBIMO1dgIy/6ubZmfuBdiObnIv6otkVhaiFhqUNwYA11k0 WCaVGd/0ztphuNQ1SiQzkoLMhsZE0EEdBkRBzXnjfgaXaK2x/NpIS1vyaBf9Ayprq3rK bbMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738159231; x=1738764031; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=krsz5uamJHPlkL96BkWXF4wcHUyP/7a62Ubu2Um2mJA=; b=iKnqY0JMRa4bjtzWSHbGvBgidSQeFrAforbY+y0iGEk8pVjwm5CXVRcOAN+dMlorW3 IlikGDbJriencIEDbq4MJiUDS18q0pO/loIqxUdc8pb6AjUKHRS2Oi1sEZ2UpscJPfrF pXEu7oWefsUc7sFjPTFoUFLBYuJarYaO5LjxVDOk9IVwm0+Z3qTxah6T56rZTzDtmHtf REglsJC2gltXUr0wiokyG/R4ZD4cAg2xfeMsJmprTRHuIpXlRv9bGsLIl5Gq+w6lQUGY VuOXZ1QggstC1XFC5KqYfpBbB2t8qobdoAcQm5aKF3KU7XRumvNv/1/kpa0DsukM88NI 8BTQ== X-Forwarded-Encrypted: i=1; AJvYcCVnW2p53gvpX4MnFlV7AjAn9Tjqi533dHxgYrlY0PJ46Qkf5Fmm/4RHwPIFU0oZTGtw+rhmhpM9oPYiihY=@vger.kernel.org X-Gm-Message-State: AOJu0YxoC0Yz3Vjdn8wSoAwIwSfDqkya3PXGHqKzzHDd0mBEp5PrsizW 5ym0OLE4M38ajEkKu1/TPnxCbyWyEtG4vXtWpSA+KRUmyXPcQMROGxmmoVt9QhqpQvhZxT6s1++ 4Uw== X-Google-Smtp-Source: AGHT+IGJGbnxYDrvq7lU6gxDpzVWUWkipP+X7KTUT39zO7f37Suuo9Q1GLH/gWE8bVUVLjPZahAd0oZNON0= X-Received: from pfbcv5.prod.google.com ([2002:a05:6a00:44c5:b0:729:427:8d73]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:2c88:b0:725:ea30:ab15 with SMTP id d2e1a72fcca58-72fd0bc25a9mr4342646b3a.1.1738159230959; Wed, 29 Jan 2025 06:00:30 -0800 (PST) Date: Wed, 29 Jan 2025 06:00:29 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250128213652.1880545-1-vannapurve@google.com> <4e07bbe6-9f74-45e5-b8d4-f992d2be78fc@intel.com> Message-ID: Subject: Re: [PATCH 1/1] x86/tdx: Route safe halt execution via tdx_safe_halt From: Sean Christopherson To: "Kirill A. Shutemov" Cc: Vishal Annapurve , Dave Hansen , x86@kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, erdemaktas@google.com, ackerleytng@google.com, jxgao@google.com, sagis@google.com, oupton@google.com, pgonda@google.com, dave.hansen@linux.intel.com, linux-coco@lists.linux.dev, chao.p.peng@linux.intel.com, isaku.yamahata@gmail.com Content-Type: text/plain; charset="us-ascii" On Wed, Jan 29, 2025, Kirill A. Shutemov wrote: > On Tue, Jan 28, 2025 at 04:45:35PM -0800, Sean Christopherson wrote: > > This incorrectly assumes the hypervisor is intercepting HLT. If the VM is given > > a slice of hardware, HLT-exiting may be disabled, in which case it's desirable > > for the guest to natively execute HLT, as the latencies to get in and out of "HLT" > > are lower, especially for TDX guests. Such a VM would hopefully have MONITOR/MWAIT > > available as well, but even if that were the case, the admin could select HLT for > > idling. > > > > Ugh, and I see that bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests") > > overrides default_idle(). The kernel really shouldn't do that, because odds are > > decent that any TDX guest will have direct access to HLT. The best approach I > > can think of would be to patch x86_idle() to tdx_safe_halt() if and only if a HLT > > #VE is taken. The tricky part would be delaying the update until it's safe to do > > so. > > I am confused. HLT triggers #VE unconditionally in TDX guests. How would > TDX guest have direct access to HLT? Gah, you're not confused, I am. I was thinking of the SEV-ES model where intercepts are morphed to #VC. > Even if it would in the future, it is going to explicit opt-in from the > guest and we can avoid setting x86_idle() for such cases. Or explicit enumeration from the TDX module. > > As for taking a #VE, the exception itself is fine (assuming the kernel isn't off > > the rails and using a trap gate :-D). The issue is likely that RFLAGS.IF=1 on > > the stack, and so the call to cond_local_irq_enable() enables IRQs before making > > the hypercall. E.g. no one has complained about #VC, because exc_vmm_communication() > > doesn't enable IRQs. > > > > Off the top of my head, I can't think of any flows that would do HLT with IRQs > > fully enabled. Even PV spinlocks use safe_halt(), e.g. in kvm_wait(), so I don't > > think there's any value in trying to precisely identify that it's a safe HLT? > > I can only think of "CPU is dead" use-case of HLT where interrupts are > enabled. But I hate special-casing HLT in exc_virtualization_exception() :/ Ignore me, overriding at boot time is the way to go. > > E.g. this should fix the immediate problem, and then ideally someone would make > > TDX guests play nice with native HLT. > > I've asked (some time ago) TDX module folks to provide interruptibility > state as part of the guest so we can handle STI shadow properly, not as a > hack around HLT. > > The immediate problem can be addressed by fixing the BIOS to not advertise > C-states (if I read the situation right). No, something like Vishal proposed is a better fix. It's still desirable for the vCPU to call out to the hypervisor when going idle, otherwise a vCPU that is idle for an extended duration will never let the pCPU go idle.