From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EAE5714AD0D for ; Wed, 29 Jan 2025 14:04:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738159465; cv=none; b=uq/upq3DIw6kD8fmpeIgu3QWuDqEYPIZK0P4TxtwsmdwMY/o1SqcnOnN7AQY0mr3QLOBHB2AavVCnouD58OUiENXbaF2OEtAKj7Yb4qjsx/8ttZlz+vY4hd5Q9OQirCgq+i8Hav2uWPq1lkqQMb40c8U82PuHwsPS8TKa5dHENM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738159465; c=relaxed/simple; bh=YU7bfd+7Mm+ltOZR1NT3GVRcIx25zlwJnkjvC81qnK4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=aYtSuO9qGG4WpbdBKnIN7TwV60HrpZsZQJl7XWS+WyXoOaOYzohaxpXhF5UP00ba4wl2l0sJEYmnQVkUs23ZZuEJ8CpPeBGb7UooD3CNTD3bQyR011NzbQSWLSlp4yOE1oHLgQaY/Ov1GA6BpqATrszCybdW2yWcWC513BwgMd0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=yJ0JPmTh; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="yJ0JPmTh" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2f780a3d6e5so13873509a91.0 for ; Wed, 29 Jan 2025 06:04:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738159463; x=1738764263; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=HmizThZ7ZntTFDTnboC13YKTEAIgKuRy2dZoopEXIPY=; b=yJ0JPmThFBh66Z1WbqZKgZ5uMBWppYeLDeUeSRtJ57aRMVFmsskr6o8sM1igIFW9jy s2h1/oIppsMfTzmXl/ebdut6MfdzslTqpGWIcPX8Owu0d4CR+De5I0pdMBGEXVDnk4Ds frK59y8nI9RHM+52gs3fwTNVFI1+81HGVWLBOgmDfHCppvDFYCMZHV+B5/CjMzjjQiwu mCQX7/QaIqe1pbJrLmeOMbiTs5mdLO2tJx4SC3o/K0Lj3H2w/msZhduolv5HDDeDp8xo psd2uHWrgzM2zQbECeyowNORpq9w+nmEe9VgaSpymwCZJ/ymtbRHrEbjC3te+CLtCLBE ewEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738159463; x=1738764263; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HmizThZ7ZntTFDTnboC13YKTEAIgKuRy2dZoopEXIPY=; b=lkUI4z4TJoH3t2RuESNyNOKUNMQpuYcHTG2vl0ODR4sKqGs7gzErhBBKMu5pJlm49X LMj/LSzEp+U4SyuUP4NGf9K9srUrlfojlyZIbyJdxs+6MqY300j1zIX+xThS3QzX/tYG wai7xpgoDWbkx1pIFOLsbjHPhDNn4UHZZs6ty/cu6jP+vuql4+OawR6eoAnjATUe0ruw Pvvc1wog3lanNP8Mo+lazwYw8nVZej6H8+cu67xLZgBJDCttPb03QyhQXcLWX8k+/EZE /KFe5sTncZ39NPJtBtKZgWt7EumsUmdl8L+C/jhB+FbdUdaFCKwZVwcRildj+cCM4Bov TT9g== X-Forwarded-Encrypted: i=1; AJvYcCWfbcom1I97Hl1pa9068LmTzd6WgWSHgkXe0nDOPehDMhdcZCjBatqpULFZt+uAazZuTY16nkedlhp94zs=@vger.kernel.org X-Gm-Message-State: AOJu0Yw6BrMISUjDT8a1dnaKpgmpDPjOhZ+hx9820/8YE9aEZKEVfJym MuU6cb8mmueIBTi+rt/fgmaT5RH5OAyjbw03sEgZe2bTbqCM/T4Mw/QJPx/2/qccPMogp3FEmZD 1BA== X-Google-Smtp-Source: AGHT+IFBLtvWujcRNtsEg/KB9XQieusBZcMY6dUIFyroY9OK7OVW2V1YrUbgozgKRiutxtgiJrOR3ShUrbQ= X-Received: from pfbbe14.prod.google.com ([2002:a05:6a00:1f0e:b0:725:c7de:e052]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:b4b:b0:729:1b8f:9645 with SMTP id d2e1a72fcca58-72fd0c7f7camr5108384b3a.24.1738159463117; Wed, 29 Jan 2025 06:04:23 -0800 (PST) Date: Wed, 29 Jan 2025 05:55:48 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250128213652.1880545-1-vannapurve@google.com> <4e07bbe6-9f74-45e5-b8d4-f992d2be78fc@intel.com> Message-ID: Subject: Re: [PATCH 1/1] x86/tdx: Route safe halt execution via tdx_safe_halt From: Sean Christopherson To: "Kirill A. Shutemov" Cc: Vishal Annapurve , Dave Hansen , x86@kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, erdemaktas@google.com, ackerleytng@google.com, jxgao@google.com, sagis@google.com, oupton@google.com, pgonda@google.com, dave.hansen@linux.intel.com, linux-coco@lists.linux.dev, chao.p.peng@linux.intel.com, isaku.yamahata@gmail.com Content-Type: text/plain; charset="us-ascii" On Wed, Jan 29, 2025, Kirill A. Shutemov wrote: > On Tue, Jan 28, 2025 at 04:45:35PM -0800, Sean Christopherson wrote: > > This incorrectly assumes the hypervisor is intercepting HLT. If the VM is given > > a slice of hardware, HLT-exiting may be disabled, in which case it's desirable > > for the guest to natively execute HLT, as the latencies to get in and out of "HLT" > > are lower, especially for TDX guests. Such a VM would hopefully have MONITOR/MWAIT > > available as well, but even if that were the case, the admin could select HLT for > > idling. > > > > Ugh, and I see that bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests") > > overrides default_idle(). The kernel really shouldn't do that, because odds are > > decent that any TDX guest will have direct access to HLT. The best approach I > > can think of would be to patch x86_idle() to tdx_safe_halt() if and only if a HLT > > #VE is taken. The tricky part would be delaying the update until it's safe to do > > so. > > I am confused. HLT triggers #VE unconditionally in TDX guests. How would > TDX guest have direct access to HLT? Gah, you're not confused, I am. I was thinking of the SEV-ES model where intercepts are morphed to #VC. > Even if it would in the future, it is going to explicit opt-in from the > guest and we can avoid setting x86_idle() for such cases. Or explicitly enumeration from the TDX module. > > As for taking a #VE, the exception itself is fine (assuming the kernel isn't off > > the rails and using a trap gate :-D). The issue is likely that RFLAGS.IF=1 on > > the stack, and so the call to cond_local_irq_enable() enables IRQs before making > > the hypercall. E.g. no one has complained about #VC, because exc_vmm_communication() > > doesn't enable IRQs. > > > > Off the top of my head, I can't think of any flows that would do HLT with IRQs > > fully enabled. Even PV spinlocks use safe_halt(), e.g. in kvm_wait(), so I don't > > think there's any value in trying to precisely identify that it's a safe HLT? > > I can only think of "CPU is dead" use-case of HLT where interrupts are > enabled. But I hate special-casing HLT in exc_virtualization_exception() :/ Ignore me, overriding at boot time is the way to go. > > E.g. this should fix the immediate problem, and then ideally someone would make > > TDX guests play nice with native HLT. > > I've asked (some time ago) TDX module folks to provide interruptibility > state as part of the guest so we can handle STI shadow properly, not as a > hack around HLT. > > The immediate problem can be addressed by fixing the BIOS to not advertise > C-states (if I read the situation right). No, something like Vishal proposed is a better fix. It's still desirable for the vCPU to call out to the hypervisor when going idle, otherwise a vCPU that is idle for an extended duration will never let the pCPU go idle.