From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754205AbbG0Nws (ORCPT ); Mon, 27 Jul 2015 09:52:48 -0400 Received: from smtp.citrix.com ([66.165.176.89]:3732 "EHLO SMTP.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753142AbbG0Nwq (ORCPT ); Mon, 27 Jul 2015 09:52:46 -0400 X-IronPort-AV: E=Sophos;i="5.15,554,1432598400"; d="scan'208";a="284710670" Message-ID: <55B637A0.5000101@citrix.com> Date: Mon, 27 Jul 2015 14:52:32 +0100 From: Andrew Cooper User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.7.0 MIME-Version: 1.0 To: Andy Lutomirski CC: X86 ML , Boris Ostrovsky , "linux-kernel@vger.kernel.org" , "Borislav Petkov" , Steven Rostedt , "xen-devel@lists.xen.org" Subject: Re: Getting rid of invalid SYSCALL RSP under Xen? References: <55B53636.80304@citrix.com> <55B567C1.3050709@citrix.com> In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-DLP: MIA1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 27/07/15 00:27, Andy Lutomirski wrote: > >>>>> For SYSRET, I think the way to go is to force Xen to always use the >>>>> syscall slow path. Instead, Xen could hook into >>>>> syscall_return_via_sysret or even right before the opportunistic >>>>> sysret stuff. Then we could remove the USERGS_SYSRET hooks entirely. >>>>> >>>>> Would this work? >>>> None of the opportunistic sysret stuff makes sense under Xen. The path >>>> will inevitably end up in xen_iret making a hypercall. Short circuiting >>>> all of this seems like a good idea, especially if it allows for the >>>> removal of the UERGS_SYSRET. >>> Doesn't Xen decide what to do based on VGCF_IN_SYSCALL? Maybe Xen >>> should have its own opportunistic VGCF_IN_SYSCALL logic. >> VGCF_in_syscall affects whether the extra r11/rcx get restored or not, >> as the hypercall itself is implemented using syscall. As the extra >> r11/rcx (and rax for that matter) are unconditionally saved in the >> hypercall stub, I can't see anything Linux could usefully do, >> opportunistically speaking. > Xen does: > > /* %rbx: struct vcpu, interrupts disabled */ > restore_all_guest: > ASSERT_INTERRUPTS_DISABLED > RESTORE_ALL > testw $TRAP_syscall,4(%rsp) > jz iret_exit_to_guest > > /* Don't use SYSRET path if the return address is not canonical. */ > movq 8(%rsp),%rcx > sarq $47,%rcx > incl %ecx > cmpl $1,%ecx > ja .Lforce_iret > > cmpw $FLAT_USER_CS32,16(%rsp)# CS > movq 8(%rsp),%rcx # RIP > movq 24(%rsp),%r11 # RFLAGS > movq 32(%rsp),%rsp # RSP > je 1f > sysretq > 1: sysretl > > That's essentially the same thing as opportunistic sysret. If Linux > stops setting VGCF_in_syscall, though, I think we'll bypass that code, > which will hurt performance. Whether this should be fixed in the > hypervisor or in the guest kernel hooks, I don't know, but it would be > easy to have a very simple xen_opportunistic_sysret path that checks > rcx==rip and r11==rflags and, if so, sets VGCF_in_syscall. I see your point. I didn't intend to suggest that Linux should stop setting VGCF_in_syscall, as it is the only entity which knows whether it is safe to clobber rcx/r11 in user context. Having said this, Xen could certainly do its own opportunistic sysret calculations as well. There are a number of issues in the Xen sysret code which I plan to fix in due course, and I will see about making this adjustment. > >>> Hmm, maybe some of this would be easier to think about if, rather than >>> having a paravirt op, we could have: >>> >>> ALTERNATIVE "", "jmp xen_pop_things_and_iret", X86_FEATURE_XEN >>> >>> Or just IF_XEN("jmp ..."); >>> >>> As a practical matter, x86_64 has native and Xen -- I don't think >>> there's any other paravirt platform that needs the asm hooks. >> It would certainly seem so. A careful use of IF_XEN() or two would make >> the code far clearer to read, and to drop the hooks. >> > Want to add an IF_XEN macro? I currently have a blocker bug against the impending Xen 4.6 release which is higher on my todo list, but I will look into this as soon as I can. ~Andrew