From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Catterall Subject: Re: [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode Date: Mon, 17 Aug 2015 14:53:52 +0100 Message-ID: <55D1E770.5070906@citrix.com> References: <1438879519-564-1-git-send-email-Ben.Catterall@citrix.com> <1438879519-564-4-git-send-email-Ben.Catterall@citrix.com> <20150810094928.GC3094@deinos.phlegethon.org> <55C87989.6050700@citrix.com> <20150811095535.GA884@deinos.phlegethon.org> <55CA2824.4020405@citrix.com> <20150811170522.GD884@deinos.phlegethon.org> <55CA2E91.4030204@citrix.com> <55CA3EF3.7090001@oracle.com> <55CB4A56.1000600@citrix.com> <55CB4B14.8060704@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <55CB4B14.8060704@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper , Boris Ostrovsky , Tim Deegan Cc: xen-devel@lists.xensource.com, keir@xen.org, ian.campbell@citrix.com, george.dunlap@eu.citrix.com, Aravind Gopalakrishnan , jbeulich@suse.com, Suravee Suthikulpanit List-Id: xen-devel@lists.xenproject.org On 12/08/15 14:33, Andrew Cooper wrote: > On 12/08/15 14:29, Andrew Cooper wrote: >> On 11/08/15 19:29, Boris Ostrovsky wrote: >>> On 08/11/2015 01:19 PM, Andrew Cooper wrote: >>>> On 11/08/15 18:05, Tim Deegan wrote: >>>>>>>> * Under this model, PV exception handlers should copy themselves >>>>>>>> onto >>>>>>>> the privileged execution stack. >>>>>>>> * Currently, the IST handlers copy themselves onto the primary >>>>>>>> stack if >>>>>>>> they interrupt guest context. >>>>>>>> * AMD Task Register on vmexit. (this old gem) >>>>>>> Gah, this thing. : >>>>>> Curious (and I can't seem find this in the manuals): What is this >>>>>> thing? >>>>> IIRC: AMD processors don't context switch TR on vmexit, >>>> Correct >>>> >>>>> which makes using IST handlers tricky there. >>>> (That is one way of putting it) >>>> >>>> IST handlers cannot be used by Xen if Xen does not switch the task >>>> register before stgi, or IST exceptions (NMI, MCE and double fault) will >>>> be taken with guest-supplied stack pointers. >>>> >>>>> We'd have to do the TR context switch ourselves, and that would be >>>>> expensive. >>>> It is suspected to be expensive, but I have never actually seen any >>>> numbers one way or another. >>>> >>>>> Andrew, am I remembering that right? >>>> Looks about right. >>>> >>>> I have been meaning to investigate this for a while, but never had >>>> the time. >>>> >>>> Xen opts for disabling interrupt stack tables in the context of AMD HVM >>>> vcpus, which interacts catastrophically with debug builds using >>>> MEMORY_GUARD. MEMORY_GUARD shoots a page out of the primary stack to >>>> detect stack overflows, but without an IST double fault hander, ends in >>>> a triple fault rather than a host crash detailing the stack overflow. >>>> >>>> KVM unilaterally reloads the host task register on vmexit, and I suspect >>>> this is probably the way to go, but have not had time to investigate >>>> whether there is any performance impact from doing so. Given how little >>>> of a TSS is actually used in long mode, I wouldn't expect an `ltr` to be >>>> as expensive as it might have been in legacy modes. >>>> >>>> (CC'ing the AMD SVM maintainers to see if they have any information on >>>> this subject) >>>> >>> I actually didn't even realize that TR is not saved on vmexit ;-/. >>> >>> Would switching TR only when we know that we need to enter this >>> deprivileged mode help? >> This is an absolute must. It is not safe to use syscall/sysexit without >> IST in place for NMIs and MCEs. >> >>> Assuming that it is less expensive than copying the stack. >> I was referring to the stack overflow issue, and whether it might be >> sensible to pro-actively which TR. > > Ahem! s/which/switch/ > > ~Andrew > So, have we arrived at a decision for this? Thanks! Ben