From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andy Lutomirski Subject: Re: Should SEV-ES #VC use IST? (Re: [PATCH] Allow RDTSC and RDTSCP from userspace) Date: Tue, 23 Jun 2020 11:26:52 -0700 Message-ID: References: <20200425202316.GL21900@8bytes.org> <20200623094519.GF31822@suse.de> <20200623104559.GA4817@hirez.programming.kicks-ass.net> <20200623111107.GG31822@suse.de> <20200623111443.GC4817@hirez.programming.kicks-ass.net> <20200623114324.GA14101@suse.de> <20200623115014.GE4817@hirez.programming.kicks-ass.net> <20200623121237.GC14101@suse.de> <20200623130322.GH4817@hirez.programming.kicks-ass.net> <9e3f9b2a-505e-dfd7-c936-461227b4033e@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: In-Reply-To: <9e3f9b2a-505e-dfd7-c936-461227b4033e@citrix.com> Sender: linux-kernel-owner@vger.kernel.org To: Andrew Cooper Cc: Peter Zijlstra , Joerg Roedel , Andy Lutomirski , Joerg Roedel , Dave Hansen , Tom Lendacky , Mike Stunes , Dan Williams , Dave Hansen , "H. Peter Anvin" , Juergen Gross , Jiri Slaby , Kees Cook , kvm list , LKML , Thomas Hellstrom , Linux Virtualization , X86 ML , Sean Christopherson List-Id: virtualization@lists.linuxfoundation.org On Tue, Jun 23, 2020 at 8:23 AM Andrew Cooper wrote: > > On 23/06/2020 14:03, Peter Zijlstra wrote: > > On Tue, Jun 23, 2020 at 02:12:37PM +0200, Joerg Roedel wrote: > >> On Tue, Jun 23, 2020 at 01:50:14PM +0200, Peter Zijlstra wrote: > >>> If SNP is the sole reason #VC needs to be IST, then I'd strongly urge > >>> you to only make it IST if/when you try and make SNP happen, not before. > >> It is not the only reason, when ES guests gain debug register support > >> then #VC also needs to be IST, because #DB can be promoted into #VC > >> then, and as #DB is IST for a reason, #VC needs to be too. > > Didn't I read somewhere that that is only so for Rome/Naples but not for > > the later chips (Milan) which have #DB pass-through? > > I don't know about hardware timelines, but some future part can now opt > in to having debug registers as part of the encrypted state, and swapped > by VMExit, which would make debug facilities generally usable, and > supposedly safe to the #DB infinite loop issues, at which point the > hypervisor need not intercept #DB for safety reasons. > > Its worth nothing that on current parts, the hypervisor can set up debug > facilities on behalf of the guest (or behind its back) as the DR state > is unencrypted, but that attempting to intercept #DB will redirect to > #VC inside the guest and cause fun. (Also spare a thought for 32bit > kernels which have to cope with userspace singlestepping the SYSENTER > path with every #DB turning into #VC.) What do you mean 32-bit? 64-bit kernels have exactly the same problem. At least the stack is okay, though. Anyway, since I'm way behind on this thread, here are some thoughts: First, I plan to implement actual precise recursion detection for the IST stacks. We'll be able to reliably panic when unallowed recursion happens. Second, I don't object *that* strongly to switching to a second #VC stack if an NMI or MCE happens, but we really need to make sure we cover *all* the bases. And #VC is distressingly close to "happens at all kinds of unfortunate times and the guest doesn't actually have much ability to predice it" right now. So we have #VC + #DB + #VC, #VC + NMI + #VC, #VC + MCE + #VC, and even worse options. So doing the shift in a reliable way is not necessarily possible in a clean way. Let me contemplate. And maybe produce some code soon.