Hi. I'll reply to that mail again because my previous reply was done in a big hurry. The new patch is attached. I applied most of the optimizations you suggested and reverted the things you said were unnecessary/wrong - thanks. Does this one look better? Petr Vandrovec wrote: >> 1. I am not allocating the ring1 stack separately, >> I am allocating it on a ring0 stack. Much simpler >> code, although non-reentrant/preempt-unsafe. > I think that it is problem. And do not forget that NMI can occur at any > place, so I'm not quite sure how your code works. I assume this was a misunderstanding. Now I use "preempt_stop" instead of "cli" to make it more obvious what am I doing. So I assume that part is preempt-unsafe, but not NMI-unsafe. >> +#define NMI_STACK_RESERVE 0x400 >> +#define MAKE_ESPFIX \ >> + cli; \ >> + subl $NMI_STACK_RESERVE, %esp; \ > This is ugly. What if NMI handler uses more than 1KB? In that new patch I set the const to 0xe00, which is 3,5K. Is it still the limitation? I can probably raise it ever higher, but I am not sure if there is some sensible data below the stack that I may overwrite ocasionally? > And ESP is not > multiple of 4 here - it may slow down processor a bit. Fixed with "pushl/movw". > I'm not sure about lea speed. And fast path should NOT jump, > forward jumps are initialy predicted as not taken... Maybe: Done. Looks much better now! > And then you can move MAKE_ESPFIX out of macro. He-he, but RESTORE_ALL is a macro itself, so... :) > Maybe you want to introduce some per-thread flag, What kind of flag do you have in mind? Flag the modify_ldt() calls that set some segment to 16bit? But is it really worth the 8 insns I use? Checking the per-thread flag will cost you ~5 insns anyway, and a headache... > or leave syscall path > to corrupt ESP (dosemu apps should not call Linux INT syscall, yes?). I wanted to do that, but then recalled the comment of Pavel Machek, who thinks exporting the %esp value to user-space is an "informational leak". I personally don't see the security threat here, but it looks safe to assume that Pavel is right. In which case I think we have to root out the bug from every path, including the syscall. So I left that place as is for now. >> - if (regs->xcs & 3) >> + if (regs->xcs & 2) >> printk(" ESP: %04x:%08lx",0xffff & regs->xss,regs->esp); > I think you should leave this as is, regs->xss/regs->esp should be valid > for CPL1 exceptions. OK, everything you mentioned, is reverted. > Rest looks fine, but I'm not completely sure that SMP and NMIs are > handled in the best possible way. Well, even though my patch can be improved, I don't feel quite comfortable with its complexity. Now I really want to re-assure myself that this all is not possible to do on ring0, in which case this will simply be the trivial thing. Anonymous LKML reader (whom I have to thank for the hints) proposed 2 things: 1. Try lss followed by iret. In that case the interrupt will not kill us because after lss the interrupts are not accepted until the next insn is executed, AFAIK. But he was not sure, and I don't know either, if it is true even for NMI. So if it is - we could just do lss/iret, and the problem is solved. So is this true for NMI? Is this documented anywhere? 2. Set task gate for NMI. AFAICS, the task-gate is now used only for the doublefault exception, but not for NMI. If I understand the idea correctly, this will guarantee that the NMI will be executing on a separate stack, which may be a good idea in any case, and will allow us to use the small stack at ring-0, if only we disable the interrupts. Are there any chances to use the task-gate for NMI? I may even try to implement that myself, but I guess there are *reasons* why it is not yet? (of course this applies only if 1. fails) Or maybe somehow to modify the NMI handler itself, so that it will check if it is on a "small" stack, and switch to the "big" stack before anything else? Well, doing the whole trick at ring-0 sounds really plausible to me...