Hi, Petr Vandrovec wrote: > In that new patch I set the const to 0xe00, which > is 3,5K. Is it still the limitation? I can probably > For 4KB stacks 2KB looks better OK, done that. Wondering though, for what? I don't need 2K myself, I need 24 bytes only. So what prevents me to raise the gap to 3.5K or somesuch? Why 2K looks better? >> 2. Set task gate for NMI. AFAICS, the task-gate is > Yes. But you still have to handle exceptions on iret > to userspace :-( Yes. Thanks, I see the problem now. And I suppose setting the task-gates also for all that exceptions is not an option... >> Or maybe somehow to modify the NMI handler itself, > If you can do that, great. But you have to modify > at least NMI, GP and SS fault handlers to reload > SS/ESP with correct values. Yes, that's the problem either. Doesn't look too difficult (I'll have to look up TSS for the stack pointer I guess, then do lss to it), but probably modifying all that handlers for only that small purpose is an overkill... >> +.previous; \ >> + /* preparing the ESPfix here */ \ >> + /* reserve some space on stack for the NMI handler */ \ >> +8: subl $(NMI_STACK_RESERVE-4), %esp; \ >> + .rept 5; \ > Any reason why you left this part of RESTORE_ALL macro? The reason is that the previous part of the macro can jump to that part. So how can I divide those? > be moved out, IMHO. RESTORE_ALL is used twice in entry.S, so you > could save one copy. Do you mean the NMI return path doesn't need the ESP fix at all? Why? > Though I'm not sure why NMI handler simple > does not jump to RESTORE_ALL we already have. I can only change that and then un-macro the RESTORE_ALL completely. So I did that. Additionally I introduced the "short path" for the case where we know for sure that we are returning to the kernel. And I am not setting the exception handler there because returning to the kernel if fails, should die() anyway. Is this correct? >> + pushl 20(%esp); \ >> + andl $~(IF_MASK | TF_MASK | RF_MASK | NT_MASK | AC_MASK), (%esp); \ >> + /* we need at least IOPL1 to re-enable interrupts */ \ >> + orl $IOPL1_MASK, (%esp); \ >> + pushl $__ESPFIX_CS; \ >> + pushl $espfix_trampoline; \ > FYI, on my system (P4/1.6GHz) 100M loops of these pushes takes 1.20sec > while written with subl $24,%esp and then doing movs to xx(%esp) takes > 0.94sec. OK, I wasn't sure what pushes do you mean. I supposed you wanted me to replace those 5 pushes that copy the stack frame. > Plus you could then reorganize code a bit (first do 5 pushes > to copy stack, then subtract 24 from esp, and push eax/ebp after that. > This way you can use %eax for computations you currently do in memory Done, thanks! Does this help your test-case? >> + .quad 0x00cfba000000ffff /* 0xd0 - ESPfix CS */ >> + .quad 0x0000b2000000ffff /* 0xd8 - ESPfix SS */ > Set SS limit to 23 bytes, so misuse can be quickly catched? Yes! So the new patch is attached:)