From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: [PATCH 6/6] mini-os/x86-64 entry: check against nested events and try to fix up Date: Sat, 09 Mar 2013 14:44:19 -0800 Message-ID: <513BBB43.6060201@goop.org> References: <1362778219-8576-1-git-send-email-xzhang@cs.uic.edu> <1362778219-8576-7-git-send-email-xzhang@cs.uic.edu> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1362778219-8576-7-git-send-email-xzhang@cs.uic.edu> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Xu Zhang Cc: samuel.thibault@ens-lyon.org, stefano.stabellini@eu.citrix.com, gm281@cam.ac.uk, xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 03/08/2013 01:30 PM, Xu Zhang wrote: > +# [How we do the fixup]. We want to merge the current stack frame with the > +# just-interrupted frame. How we do this depends on where in the critical > +# region the interrupted handler was executing, and so how many saved > +# registers are in each frame. We do this quickly using the lookup table > +# 'critical_fixup_table'. For each byte offset in the critical region, it > +# provides the number of bytes which have already been popped from the > +# interrupted stack frame. This is the number of bytes from the current stack > +# that we need to copy at the end of the previous activation frame so that > +# we can continue as if we've never even reached 11 running in the old > +# activation frame. > +critical_region_fixup: > + addq $critical_fixup_table - scrit, %rax > + movzbq (%rax),%rax # %rax contains num bytes popped > + mov %rsp,%rsi > + add %rax,%rsi # %esi points at end of src region > + > + movq RSP(%rsp),%rdi # acquire interrupted %rsp from current stack frame > + # %edi points at end of dst region > + mov %rax,%rcx > + shr $3,%rcx # convert bytes into count of 64-bit entities > + je 16f # skip loop if nothing to copy > +15: subq $8,%rsi # pre-decrementing copy loop > + subq $8,%rdi > + movq (%rsi),%rax > + movq %rax,(%rdi) > + loop 15b > +16: movq %rdi,%rsp # final %rdi is top of merged stack > + andb $KERNEL_CS_MASK,CS(%rsp) # CS on stack might have changed > + jmp 11b > + > + > +/* Nested event fixup look-up table*/ > +critical_fixup_table: > + .byte 0x00,0x00,0x00 # XEN_TEST_PENDING(%rsi) > + .byte 0x00,0x00,0x00,0x00,0x00,0x00 # jnz 14f > + .byte 0x00,0x00,0x00,0x00 # mov (%rsp),%r15 > + .byte 0x00,0x00,0x00,0x00,0x00 # mov 0x8(%rsp),%r14 > + .byte 0x00,0x00,0x00,0x00,0x00 # mov 0x10(%rsp),%r13 > + .byte 0x00,0x00,0x00,0x00,0x00 # mov 0x18(%rsp),%r12 > + .byte 0x00,0x00,0x00,0x00,0x00 # mov 0x20(%rsp),%rbp > + .byte 0x00,0x00,0x00,0x00,0x00 # mov 0x28(%rsp),%rbx > + .byte 0x00,0x00,0x00,0x00 # add $0x30,%rsp > + .byte 0x30,0x30,0x30,0x30 # mov (%rsp),%r11 > + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x8(%rsp),%r10 > + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x10(%rsp),%r9 > + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x18(%rsp),%r8 > + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x20(%rsp),%rax > + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x28(%rsp),%rcx > + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x30(%rsp),%rdx > + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x38(%rsp),%rsi > + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x40(%rsp),%rdi > + .byte 0x30,0x30,0x30,0x30 # add $0x50,%rsp > + .byte 0x80,0x80,0x80,0x80 # testl $NMI_MASK,2*8(%rsp) > + .byte 0x80,0x80,0x80,0x80 > + .byte 0x80,0x80 # jnz 2f > + .byte 0x80,0x80,0x80,0x80 # testb $1,(xen_features+XENFEAT_supervisor_mode_kernel) > + .byte 0x80,0x80,0x80,0x80 > + .byte 0x80,0x80 # jnz 1f > + .byte 0x80,0x80,0x80,0x80,0x80 # orb $3,1*8(%rsp) > + .byte 0x80,0x80,0x80,0x80,0x80 # orb $3,4*8(%rsp) > + .byte 0x80,0x80 # iretq > + .byte 0x80,0x80,0x80,0x80 # andl $~NMI_MASK, 16(%rsp) > + .byte 0x80,0x80,0x80,0x80 > + .byte 0x80,0x80 # pushq $\flag > + .byte 0x78,0x78,0x78,0x78,0x78 # jmp hypercall_page + (__HYPERVISOR_iret * 32) > + .byte 0x00,0x00,0x00,0x00 # XEN_LOCKED_BLOCK_EVENTS(%rsi) > + .byte 0x00,0x00,0x00 # mov %rsp,%rdi > + .byte 0x00,0x00,0x00,0x00,0x00 # jmp 11b This looks super-fragile. The original Xen-linux kernel code had a similar kind of fixup table, but I went to some lengths to make it as simple and robust as possible in the pvops kernels. See the comment in xen-asm_32.S: * Because the nested interrupt handler needs to deal with the current * stack state in whatever form its in, we keep things simple by only * using a single register which is pushed/popped on the stack. 64-bit pvops Linux always uses the iret hypercall, so the issue is moot there. (In principle a nested kernel interrupt could avoid the iret, but it wasn't obvious that the extra complexity was worth it.) J