* Nested events in 64bit mini-OS @ 2012-10-23 22:43 Xu Zhang 2012-10-25 20:56 ` Samuel Thibault 0 siblings, 1 reply; 9+ messages in thread From: Xu Zhang @ 2012-10-23 22:43 UTC (permalink / raw) To: xen-devel@lists.xen.org Dear all, I don't understand why 64-bit mini-OS doesn't check against nested Xen events (in entry.S). In 32-bit code, a critical section is defined and re-entrant is checked. A fix-up routine is executed to coalesce the stack frames if that's the case, because there is a window for nested events to happen after re-enabling event delivery and before a direct iret. In 64-bit mini-OS, however, a critical section is defined but not used. My understanding is that ideally, one could a) unmask event and do an direct iret, but check against nested events, try to fix stack frames if that happens; or b) do not re-enable event and use hypercall iret to return (no need to fix-up stack frames). 64-bit mini-OS seems to adopt a mixed use of both (in HYPERVISOR_IRET). mini-OS doesn't have an userspace, so unless an NMI happened, it always perform interrupt/exception return with machine instruction iret, without checking against nested events. This is wrong to me. Am I missing something here? Please advise. Thank you, Xu -- xu :q! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Nested events in 64bit mini-OS 2012-10-23 22:43 Nested events in 64bit mini-OS Xu Zhang @ 2012-10-25 20:56 ` Samuel Thibault 2012-11-06 5:56 ` Xu Zhang 0 siblings, 1 reply; 9+ messages in thread From: Samuel Thibault @ 2012-10-25 20:56 UTC (permalink / raw) To: Xu Zhang; +Cc: gm281, xen-devel@lists.xen.org Xu Zhang, le Tue 23 Oct 2012 17:43:28 -0500, a écrit : > 64-bit mini-OS seems to adopt a mixed use of both (in HYPERVISOR_IRET). > mini-OS doesn't have an userspace, so unless an NMI happened, it always > perform interrupt/exception return with machine instruction iret, without > checking against nested events. This is wrong to me. Am I missing something > here? I don't think you are missing anything. Cc-ing Grzegorz, who actually wrote the code. Samuel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Nested events in 64bit mini-OS 2012-10-25 20:56 ` Samuel Thibault @ 2012-11-06 5:56 ` Xu Zhang 2012-11-10 13:52 ` Samuel Thibault 0 siblings, 1 reply; 9+ messages in thread From: Xu Zhang @ 2012-11-06 5:56 UTC (permalink / raw) To: Samuel Thibault, xen-devel@lists.xen.org, gm281 [-- Attachment #1: Type: text/plain, Size: 1337 bytes --] Dear all, I haven't seen any updates on this matter, so I try to come up with a fix. Generally speaking, I want to mimic 32-bit mini-OS behaviour, adding: 1) a fixup table: for each byte offset in the critical region, it provides the number of bytes which have already been popped from the interrupted stack frame. 2) a fixup routine: obtain number of restored registers by quickly looking up the fixup table and coalesce the current stack frame with the just-interrupted one. 3) checks against re-entrance in hypervisor_callback The "git diff" output is attached to this email. I only did some naive "tests" by running it inside gdb. I am wondering is there a test suite for mini-OS? A follow-up question is that given this fix, do you still need hypercall iret? Thank you very much. Regards, Xu Zhang On 10/25/2012 03:56 PM, Samuel Thibault wrote: > Xu Zhang, le Tue 23 Oct 2012 17:43:28 -0500, a écrit : >> 64-bit mini-OS seems to adopt a mixed use of both (in HYPERVISOR_IRET). >> mini-OS doesn't have an userspace, so unless an NMI happened, it always >> perform interrupt/exception return with machine instruction iret, without >> checking against nested events. This is wrong to me. Am I missing something >> here? > I don't think you are missing anything. Cc-ing Grzegorz, who actually > wrote the code. > > Samuel [-- Attachment #2: fixup.diff --] [-- Type: text/x-patch, Size: 5515 bytes --] diff --git a/extras/mini-os/arch/x86/x86_64.S b/extras/mini-os/arch/x86/x86_64.S index a65e5d5..b4f351c 100644 --- a/extras/mini-os/arch/x86/x86_64.S +++ b/extras/mini-os/arch/x86/x86_64.S @@ -42,8 +42,10 @@ hypercall_page: NMI_MASK = 0x80000000 +#define RBX 40 #define RDI 112 #define ORIG_RAX 120 /* + error_code */ +#define RIP 128 #define EFLAGS 144 .macro RESTORE_ALL @@ -147,7 +149,43 @@ error_call_handler: ENTRY(hypervisor_callback) - zeroentry hypervisor_callback2 + movq (%rsp),%rcx + movq 8(%rsp),%r11 + addq $0x10,%rsp /* skip rcx and r11 */ + pushq $0 /* push error code/oldrax */ + pushq %rax /* push real oldrax to the rdi slot */ + leaq hypervisor_callback2(%rip), %rax + + /* rdi slot contains rax, oldrax contains error code */ + cld + subq $14*8,%rsp + movq %rsi,13*8(%rsp) + movq 14*8(%rsp),%rsi /* load rax from rdi slot */ + movq %rdx,12*8(%rsp) + movq %rcx,11*8(%rsp) + movq %rsi,10*8(%rsp) /* store rax */ + movq %r8, 9*8(%rsp) + movq %r9, 8*8(%rsp) + movq %r10,7*8(%rsp) + movq %r11,6*8(%rsp) + movq %rbx,5*8(%rsp) + movq %rbp,4*8(%rsp) + movq %r12,3*8(%rsp) + movq %r13,2*8(%rsp) + movq %r14,1*8(%rsp) + movq %r15,(%rsp) + movq %rdi, RDI(%rsp) + + # check against re-entrance + movq RIP(%rsp),%rbx + cmpq $scrit,%rbx + jb 10f + cmpq $ecrit,%rbx + jb critical_region_fixup + +10: movq RBX(%rsp),%rbx # restore rbx + movq %rsp,%rdi + call *%rax ENTRY(hypervisor_callback2) movq %rdi, %rsp @@ -172,17 +210,40 @@ scrit: /**** START OF CRITICAL REGION ****/ 14: XEN_LOCKED_BLOCK_EVENTS(%rsi) XEN_PUT_VCPU_INFO(%rsi) - subq $6*8,%rsp - movq %rbx,5*8(%rsp) - movq %rbp,4*8(%rsp) - movq %r12,3*8(%rsp) - movq %r13,2*8(%rsp) - movq %r14,1*8(%rsp) - movq %r15,(%rsp) - movq %rsp,%rdi # set the argument again + pushq %rbx + pushq %rbp + pushq %r12 + pushq %r13 + pushq %r14 + pushq %r15 + movq %rsp,%rdi # set the argument again jmp 11b ecrit: /**** END OF CRITICAL REGION ****/ +# [How we do the fixup]. We want to merge the current stack frame with the +# just-interrupted frame. How we do this depends on where in the critical +# region the interrupted handler was executing, and so how many saved +# registers are in each frame. We do this quickly using the lookup table +# 'critical_fixup_table'. For each byte offset in the critical region, it +# provides the number of bytes which have already been popped from the +# interrupted stack frame. +critical_region_fixup: + addq $critical_fixup_table - scrit, %rbx + movzbq (%rbx),%rbx # %rbx contains num bytes popped + mov %rsp,%rsi + add %rbx,%rsi # %esi points at end of src region + mov %rsp,%rdi + add $0xa8,%rdi # %edi points at end of dst region + mov %rbx,%rcx + shr $3,%rcx # convert bytes into count of 64-bit entities + je 16f # skip loop if nothing to copy +15: subq $8,%rsi # pre-decrementing copy loop + subq $8,%rdi + movq (%rsi),%rbx + movq %rbx,(%rdi) + loop 15b +16: movq %rdi,%rsp # final %rdi is top of merged stack + jmp 10b retint_kernel: retint_restore_args: @@ -210,6 +271,42 @@ error_exit: jmp retint_kernel +critical_fixup_table: + .byte 0x30,0x30,0x30 # testb $0xff,(%rsi) + .byte 0x30,0x30 # jne 14f + .byte 0x30,0x30,0x30,0x30 # mov (%rsp),%r11 + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x8(%rsp),%r10 + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x10(%rsp),%r9 + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x18(%rsp),%r8 + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x20(%rsp),%rax + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x28(%rsp),%rcx + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x30(%rsp),%rdx + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x38(%rsp),%rsi + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x40(%rsp),%rdi + .byte 0x30,0x30,0x30,0x30 # add $0x50,%rsp + .byte 0x80,0x80,0x80,0x80 # testl $NMI_MASK,2*8(%rsp) + .byte 0x80,0x80,0x80,0x80 + .byte 0x80,0x80 # jne 2f + .byte 0x80,0x80,0x80,0x80 # testb $1,(xen_features+XENFEAT_supervisor_mode_kernel) + .byte 0x80,0x80,0x80,0x80 + .byte 0x80,0x80 # jne 1f + .byte 0x80,0x80,0x80,0x80,0x80 # orb $0x3,0x8(%rsp) + .byte 0x80,0x80,0x80,0x80,0x80 # orb $0x3,0x20(%rsp) + .byte 0x80,0x80 # iretq + .byte 0x80,0x80,0x80,0x80 # andl $~NMI_MASK,0x10(%rsp) + .byte 0x80,0x80,0x80,0x80 + .byte 0x80,0x80 # pushq $0x0 + .byte 0x78,0x78,0x78,0x78,0x78 # jmp hypercall_page + (__HYPERVISOR_iret * 32) + .byte 0x30,0x30,0x30,0x30 # movb $0x1,0x1(%rsi) + .byte 0x30 # push %rbx + .byte 0x28 # push %rbp + .byte 0x20,0x20 # push %r12 + .byte 0x18,0x18 # push %r13 + .byte 0x10,0x10 # push %r14 + .byte 0x08,0x08 # push %r15 + .byte 0x00,0x00,0x00 # mov %rsp,%rdi + .byte 0x00,0x00,0x00,0x00,0x00 # jmpq 11b + ENTRY(failsafe_callback) popq %rcx [-- Attachment #3: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: Nested events in 64bit mini-OS 2012-11-06 5:56 ` Xu Zhang @ 2012-11-10 13:52 ` Samuel Thibault 2012-11-11 15:36 ` Xu Zhang 2012-11-19 10:21 ` Ian Campbell 0 siblings, 2 replies; 9+ messages in thread From: Samuel Thibault @ 2012-11-10 13:52 UTC (permalink / raw) To: Xu Zhang; +Cc: gm281, xen-devel@lists.xen.org Hello, Xu Zhang, le Mon 05 Nov 2012 23:56:04 -0600, a écrit : > I haven't seen any updates on this matter, so I try to come up with a fix. Thanks! > Generally speaking, I want to mimic 32-bit mini-OS behaviour, Ok. Probably a good thing :) > The "git diff" output is attached to this email. I only did some naive > "tests" by running it inside gdb. Do you mean that you actually ran a stack merging, even if only in gdb? I would trust that well enough. > I am wondering is there a test suite for mini-OS? You can simply run it, giving it a blk dev (or also a fb dev), it will run app_main(), which keeps reading stuff, and draws squares on the fb. Ideally you should manage to trigger fixups, by stuffing a wait loop in the critical section. > A follow-up question is that given this fix, do you still need > hypercall iret? Actually it's worse than that, see below. > + movq %rbx,5*8(%rsp) ... > + > + # check against re-entrance > + movq RIP(%rsp),%rbx > + cmpq $scrit,%rbx > + jb 10f > + cmpq $ecrit,%rbx > + jb critical_region_fixup > + > +10: movq RBX(%rsp),%rbx # restore rbx Do we really need to restore %rbx? I don't think do_hypervisor_callback uses it. > + .byte 0x78,0x78,0x78,0x78,0x78 # jmp hypercall_page + (__HYPERVISOR_iret * 32) Here we would also need a fixup table for the code at hypercall_page! A nicer fix would be to inline the hypercall code here. That said, I have to say I don't know any detail about the NMI flag, I don't see it defined in the Intel manual, and I don't see it being set in the Xen hypervisor. Unless somebody knows more about it, I would assume that it currently never happens, and simply stuff a ud2 /* TODO: fix re-entrance fixup for hypercall in NMI case (when does that happen actually?) */ before the jmp, to catch it if it ever does happen. Also, it would be good to check against critical section size change, in case somebody e.g. changes a value, or a macro like XEN_PUT_VCPU_INFO. For instance, stuff right after the table: .if (ecrit-scrit) != (critical_fixup_table_end - critical_fixup_table) .error "The critical has changed, the fixup table needs updating" .endif More generally, some cleanup could go along the patch, but I'd say keep it as you have done, focused on only the fixup code, to have it as such in the repository history, and then we could clean some things afterwards. Samuel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Nested events in 64bit mini-OS 2012-11-10 13:52 ` Samuel Thibault @ 2012-11-11 15:36 ` Xu Zhang 2012-11-14 1:49 ` Xu Zhang 2012-11-19 10:21 ` Ian Campbell 1 sibling, 1 reply; 9+ messages in thread From: Xu Zhang @ 2012-11-11 15:36 UTC (permalink / raw) To: Samuel Thibault, xen-devel@lists.xen.org, gm281 Samuel, Thanks for the reply. On 11/10/2012 07:52 AM, Samuel Thibault wrote: > You can simply run it, giving it a blk dev (or also a fb dev), it will > run app_main(), which keeps reading stuff, and draws squares on the fb. > > Ideally you should manage to trigger fixups, by stuffing a wait loop in > the critical section. I'll try this in the following week. Thank you for pointing it out. >> A follow-up question is that given this fix, do you still need >> hypercall iret? > Actually it's worse than that, see below. > >> + movq %rbx,5*8(%rsp) > ... >> + >> + # check against re-entrance >> + movq RIP(%rsp),%rbx >> + cmpq $scrit,%rbx >> + jb 10f >> + cmpq $ecrit,%rbx >> + jb critical_region_fixup >> + >> +10: movq RBX(%rsp),%rbx # restore rbx > Do we really need to restore %rbx? I don't think do_hypervisor_callback > uses it. No I don't think so. I was just being pre-cautious. :-) >> + .byte 0x78,0x78,0x78,0x78,0x78 # jmp hypercall_page + (__HYPERVISOR_iret * 32) > Here we would also need a fixup table for the code at hypercall_page! Nice catch! > A nicer fix would be to inline the hypercall code here. That said, I > have to say I don't know any detail about the NMI flag, I don't see it > defined in the Intel manual, and I don't see it being set in the Xen > hypervisor. Unless somebody knows more about it, I would assume that it > currently never happens, and simply stuff a > > ud2 /* TODO: fix re-entrance fixup for hypercall in NMI case (when does that happen actually?) */ > > before the jmp, to catch it if it ever does happen. I don't know that either. I tried to rise guest domain an NMI with "xl trigger", but it seems that only works for HVM hosts. > > Also, it would be good to check against critical section size change, in > case somebody e.g. changes a value, or a macro like XEN_PUT_VCPU_INFO. > For instance, stuff right after the table: > > .if (ecrit-scrit) != (critical_fixup_table_end - critical_fixup_table) > .error "The critical has changed, the fixup table needs updating" > .endif Totally agree. And it somewhat saves the obligation of checking if the table size matches the one of critical section by looking at disassemble output. :-) > > More generally, some cleanup could go along the patch, but I'd say > keep it as you have done, focused on only the fixup code, to have it > as such in the repository history, and then we could clean some things > afterwards. > > Samuel > I really appreciate your feedback. Thanks again, Samuel! Xu ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Nested events in 64bit mini-OS 2012-11-11 15:36 ` Xu Zhang @ 2012-11-14 1:49 ` Xu Zhang 2012-11-18 17:43 ` Samuel Thibault 0 siblings, 1 reply; 9+ messages in thread From: Xu Zhang @ 2012-11-14 1:49 UTC (permalink / raw) To: Samuel Thibault, xen-devel@lists.xen.org, gm281 >>> + .byte 0x78,0x78,0x78,0x78,0x78 # jmp >>> hypercall_page + (__HYPERVISOR_iret * 32) >> Here we would also need a fixup table for the code at hypercall_page! > Nice catch! > >> A nicer fix would be to inline the hypercall code here. After a second thought, it seems to me such "expansion" of hypercall_page into fixup table is not necessary (and wrong). Insead, we can and need to mask out events before we doing a hypercall iret. I think it is safe (and must, see below) to do this because: 1. if event is disabled: doesn't hurt to mask it again; 2. if event is enabled: we disable event, and jumps to hypercall_page to make a hypercall iret, which eventually calls do_iret: In do_iret, line 309: /* Restore upcall mask from supplied EFLAGS.IF. */ vcpu_info(v, evtchn_upcall_mask) = !(iret_saved.rflags & X86_EFLAGS_IF); No matter what the current value of upcall mask is, do_iret always retores it to the negation of rflags.IF saved on stack. In other words, it is safe to mask upcall mask before hypercall iret. And in fact, I am afraid it's necessary that we make it a must, because even one can guard against nested events in guest OS, but can never do so on hypercall page (because the code is in Xen). We should never jump to hypercall_page iret with events enabled. So we have to set upcall mask before the jump. I guess that's one of the reason why Xen always restores upcall mask from stack. >> Also, it would be good to check against critical section size change, in >> case somebody e.g. changes a value, or a macro like XEN_PUT_VCPU_INFO. >> For instance, stuff right after the table: >> >> .if (ecrit-scrit) != (critical_fixup_table_end - >> critical_fixup_table) >> .error "The critical has changed, the fixup table needs updating" >> .endif > Totally agree. And it somewhat saves the obligation of checking if the > table size matches the one of critical section by looking at > disassemble output. :-) > > Correct me if I am wrong, I think hypercall_page is mapped at runtime to guest OS by Xen. It's not actually part of the critical section of guest OS, at least not at compile time. So having a fixup table for the code at hypercall_page would cause such check to fail (mismatch). Following the discussion above, we could easily avoid such fixup table by mask out the events. I don't know if this make sense at all. :-) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Nested events in 64bit mini-OS 2012-11-14 1:49 ` Xu Zhang @ 2012-11-18 17:43 ` Samuel Thibault 2012-11-19 10:22 ` Ian Campbell 0 siblings, 1 reply; 9+ messages in thread From: Samuel Thibault @ 2012-11-18 17:43 UTC (permalink / raw) To: Xu Zhang; +Cc: gm281, xen-devel@lists.xen.org Xu Zhang, le Tue 13 Nov 2012 19:49:27 -0600, a écrit : > 1. if event is disabled: doesn't hurt to mask it again; > 2. if event is enabled: we disable event, and jumps to hypercall_page to > make a hypercall iret, which eventually calls do_iret: > > In do_iret, line 309: > /* Restore upcall mask from supplied EFLAGS.IF. */ > vcpu_info(v, evtchn_upcall_mask) = !(iret_saved.rflags & > X86_EFLAGS_IF); Ah, right. Disabling events just before the jmp seems all right to me then. > Correct me if I am wrong, I think hypercall_page is mapped at runtime to > guest OS by Xen. It's not actually part of the critical section of guest OS, > at least not at compile time. Sure. I meant it'd mean a second fixup table, but who knows what code is there, it could be tampering with the stack. > Following the discussion above, we could easily avoid such fixup table > by mask out the events. Completely. Samuel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Nested events in 64bit mini-OS 2012-11-18 17:43 ` Samuel Thibault @ 2012-11-19 10:22 ` Ian Campbell 0 siblings, 0 replies; 9+ messages in thread From: Ian Campbell @ 2012-11-19 10:22 UTC (permalink / raw) To: Samuel Thibault Cc: Keir Fraser, Xu Zhang, gm281@cam.ac.uk, xen-devel@lists.xen.org On Sun, 2012-11-18 at 17:43 +0000, Samuel Thibault wrote: > > Correct me if I am wrong, I think hypercall_page is mapped at runtime to > > guest OS by Xen. It's not actually part of the critical section of guest OS, > > at least not at compile time. > > Sure. I meant it'd mean a second fixup table, but who knows what code is > there, it could be tampering with the stack. In practice it isn't. Since the hypercall page is a blackbox to the guest I don't think it is really reasonable for it to be messing with the stack without makings its own arrangements for correctness of nesting etc. Ian. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Nested events in 64bit mini-OS 2012-11-10 13:52 ` Samuel Thibault 2012-11-11 15:36 ` Xu Zhang @ 2012-11-19 10:21 ` Ian Campbell 1 sibling, 0 replies; 9+ messages in thread From: Ian Campbell @ 2012-11-19 10:21 UTC (permalink / raw) To: Samuel Thibault; +Cc: Xu Zhang, gm281@cam.ac.uk, xen-devel@lists.xen.org On Sat, 2012-11-10 at 13:52 +0000, Samuel Thibault wrote: > > + .byte 0x78,0x78,0x78,0x78,0x78 # jmp hypercall_page + (__HYPERVISOR_iret * 32) > > Here we would also need a fixup table for the code at hypercall_page! > A nicer fix would be to inline the hypercall code here. That said, I > have to say I don't know any detail about the NMI flag, I don't see it > defined in the Intel manual, and I don't see it being set in the Xen > hypervisor. It's a "software defined" bit (i.e. stolen out of eflags :-() used in the classic-Xen Linux guest to indicate that return must go via HYPERVISOR_iret (rather than direct iret), I think in order to let the hypervisor implement the correct semantics wrt NMI masking. The hypervisor itself doesn't actually know anything about the bit and the guest could implement this some other way if it wanted (doesn't 64 bit minios *always* use HYPERVISOR_iret?). > Unless somebody knows more about it, I would assume that it > currently never happens, That's probably a pretty good assumption, you would expect NMIs only in dom0 (real NMIs getting forwarded) or in a guest which you explicitly sent an NMI to via the toolstack, which I doubt you would do to an unsuspecting mini-os domain (not in the expectation of anything good happening at least). Ian. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-11-19 10:22 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-10-23 22:43 Nested events in 64bit mini-OS Xu Zhang 2012-10-25 20:56 ` Samuel Thibault 2012-11-06 5:56 ` Xu Zhang 2012-11-10 13:52 ` Samuel Thibault 2012-11-11 15:36 ` Xu Zhang 2012-11-14 1:49 ` Xu Zhang 2012-11-18 17:43 ` Samuel Thibault 2012-11-19 10:22 ` Ian Campbell 2012-11-19 10:21 ` Ian Campbell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).