linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stas Sergeev <stsp@aknet.ru>
To: Petr Vandrovec <vandrove@vc.cvut.cz>
Cc: linux-kernel@vger.kernel.org
Subject: Re: ESP corruption bug - what CPUs are affected?
Date: Sat, 25 Sep 2004 00:36:15 +0400	[thread overview]
Message-ID: <4154853F.6070105@aknet.ru> (raw)
In-Reply-To: <20040923180607.GA20678@vana.vc.cvut.cz>

[-- Attachment #1: Type: text/plain, Size: 2907 bytes --]

Hi,

Petr Vandrovec wrote:
> In that new patch I set the const to 0xe00, which
> is 3,5K. Is it still the limitation? I can probably
> For 4KB stacks 2KB looks better
OK, done that. Wondering though, for what?
I don't need 2K myself, I need 24 bytes only.
So what prevents me to raise the gap to 3.5K
or somesuch? Why 2K looks better?

>> 2. Set task gate for NMI. AFAICS, the task-gate is
> Yes. But you still have to handle exceptions on iret
> to userspace :-(
Yes. Thanks, I see the problem now. And I
suppose setting the task-gates also for all
that exceptions is not an option...

>> Or maybe somehow to modify the NMI handler itself,
> If you can do that, great.  But you have to modify
> at least NMI, GP and SS fault handlers to reload
> SS/ESP with correct values.
Yes, that's the problem either. Doesn't look too
difficult (I'll have to look up TSS for the stack
pointer I guess, then do lss to it), but probably
modifying all that handlers for only that small
purpose is an overkill...

>> +.previous; \
>> +	/* preparing the ESPfix here */ \
>> +	/* reserve some space on stack for the NMI handler */ \
>> +8:	subl $(NMI_STACK_RESERVE-4), %esp; \
>> +	.rept 5; \
> Any reason why you left this part of RESTORE_ALL macro?
The reason is that the previous part of the macro
can jump to that part. So how can I divide those?

> be moved out, IMHO.  RESTORE_ALL is used twice in entry.S, so you
> could save one copy.
Do you mean the NMI return path doesn't need
the ESP fix at all? Why?

> Though I'm not sure why NMI handler simple
> does not jump to RESTORE_ALL we already have.
I can only change that and then un-macro the
RESTORE_ALL completely. So I did that.
Additionally I introduced the "short path"
for the case where we know for sure that we
are returning to the kernel. And I am not
setting the exception handler there because
returning to the kernel if fails, should die()
anyway. Is this correct?

>> +	pushl 20(%esp); \
>> +	andl $~(IF_MASK | TF_MASK | RF_MASK | NT_MASK | AC_MASK), (%esp); \
>> +	/* we need at least IOPL1 to re-enable interrupts */ \
>> +	orl $IOPL1_MASK, (%esp); \
>> +	pushl $__ESPFIX_CS; \
>> +	pushl $espfix_trampoline; \
> FYI, on my system (P4/1.6GHz) 100M loops of these pushes takes 1.20sec
> while written with subl $24,%esp and then doing movs to xx(%esp) takes
> 0.94sec.
OK, I wasn't sure what pushes do you mean.
I supposed you wanted me to replace those
5 pushes that copy the stack frame.

> Plus you could then reorganize code a bit (first do 5 pushes
> to copy stack, then subtract 24 from esp, and push eax/ebp after that.
> This way you can use %eax for computations you currently do in memory
Done, thanks! Does this help your test-case?

>> +	.quad 0x00cfba000000ffff	/* 0xd0 - ESPfix CS */
>> +	.quad 0x0000b2000000ffff	/* 0xd8 - ESPfix SS */
> Set SS limit to 23 bytes, so misuse can be quickly catched?
Yes!

So the new patch is attached:)


[-- Attachment #2: linux-2.6.8-stacks4.diff --]
[-- Type: text/x-patch, Size: 9930 bytes --]

diff -urN linux-2.6.8-pcsp/arch/i386/kernel/entry.S linux-2.6.8-stacks/arch/i386/kernel/entry.S
--- linux-2.6.8-pcsp/arch/i386/kernel/entry.S	2004-06-10 13:28:35.000000000 +0400
+++ linux-2.6.8-stacks/arch/i386/kernel/entry.S	2004-09-24 15:35:46.000000000 +0400
@@ -70,15 +70,22 @@
 CF_MASK		= 0x00000001
 TF_MASK		= 0x00000100
 IF_MASK		= 0x00000200
-DF_MASK		= 0x00000400 
+DF_MASK		= 0x00000400
+IOPL1_MASK 	= 0x00001000
+IOPL2_MASK 	= 0x00002000
+IOPL_MASK 	= 0x00003000
 NT_MASK		= 0x00004000
+RF_MASK		= 0x00010000
 VM_MASK		= 0x00020000
+AC_MASK 	= 0x00040000
+VIF_MASK 	= 0x00080000
+VIP_MASK 	= 0x00100000
 
 #ifdef CONFIG_PREEMPT
 #define preempt_stop		cli
 #else
 #define preempt_stop
-#define resume_kernel		restore_all
+#define resume_kernel		resume_kernel_nopreempt
 #endif
 
 #define SAVE_ALL \
@@ -122,25 +129,6 @@
 .previous
 
 
-#define RESTORE_ALL	\
-	RESTORE_REGS	\
-	addl $4, %esp;	\
-1:	iret;		\
-.section .fixup,"ax";   \
-2:	sti;		\
-	movl $(__USER_DS), %edx; \
-	movl %edx, %ds; \
-	movl %edx, %es; \
-	pushl $11;	\
-	call do_exit;	\
-.previous;		\
-.section __ex_table,"a";\
-	.align 4;	\
-	.long 1b,2b;	\
-.previous
-
-
-
 ENTRY(lcall7)
 	pushfl			# We get a different stack layout with call
 				# gates, which has to be cleaned up later..
@@ -174,6 +162,23 @@
 	jmp do_lcall
 
 
+ENTRY(espfix_trampoline)
+	popl %esp
+espfix_past_esp:
+1:	iret
+.section .fixup,"ax"
+2:	sti
+	movl $(__USER_DS), %edx
+	movl %edx, %ds
+	movl %edx, %es
+	pushl $11
+	call do_exit
+.previous
+.section __ex_table,"a"
+	.align 4
+	.long 1b,2b
+.previous
+
 ENTRY(ret_from_fork)
 	pushl %eax
 	call schedule_tail
@@ -196,8 +201,8 @@
 	GET_THREAD_INFO(%ebp)
 	movl EFLAGS(%esp), %eax		# mix EFLAGS and CS
 	movb CS(%esp), %al
-	testl $(VM_MASK | 3), %eax
-	jz resume_kernel		# returning to kernel or vm86-space
+	testl $(VM_MASK | 2), %eax
+	jz resume_kernel
 ENTRY(resume_userspace)
  	cli				# make sure we don't miss an interrupt
 					# setting need_resched or sigpending
@@ -208,10 +213,15 @@
 	jne work_pending
 	jmp restore_all
 
+resume_kernel_nopreempt:
+	RESTORE_REGS
+	addl $4, %esp
+	iret
+
 #ifdef CONFIG_PREEMPT
 ENTRY(resume_kernel)
 	cmpl $0,TI_preempt_count(%ebp)	# non-zero preempt_count ?
-	jnz restore_all
+	jnz resume_kernel_nopreempt
 need_resched:
 	movl TI_flags(%ebp), %ecx	# need_resched set ?
 	testb $_TIF_NEED_RESCHED, %cl
@@ -294,7 +304,77 @@
 	testw $_TIF_ALLWORK_MASK, %cx	# current->work
 	jne syscall_exit_work
 restore_all:
-	RESTORE_ALL
+	/* If returning to Ring-3, not to V86, and with
+	 * the small stack, try to fix the higher word of
+	 * ESP, as the CPU won't restore it from stack.
+	 * This is an "official" bug of all the x86-compatible
+	 * CPUs, which we can try to work around to make
+	 * dosemu happy. */
+	movl EFLAGS(%esp), %eax
+	movb CS(%esp), %al
+	andl $(VM_MASK | 2), %eax
+	/* returning to user-space and not to v86? */
+	cmpl $2, %eax
+	jne 3f
+	larl OLDSS(%esp), %eax
+	/* returning to "small" stack? */
+	testl $0x00400000, %eax
+3:	RESTORE_REGS
+	jz 8f		# go fixing ESP instead of just to return :(
+	addl $4, %esp
+1:	iret
+.section .fixup,"ax"
+2:	sti
+	movl $(__USER_DS), %edx
+	movl %edx, %ds
+	movl %edx, %es
+	pushl $11
+	call do_exit
+.previous
+.section __ex_table,"a"
+	.align 4
+	.long 1b,2b
+.previous
+	/* preparing the ESPfix here */
+#define NMI_STACK_RESERVE 0x800
+	/* reserve some space on stack for the NMI handler */
+8:	subl $(NMI_STACK_RESERVE-4), %esp
+	.rept 5
+	pushl NMI_STACK_RESERVE+16(%esp)
+	.endr
+	subl $24, %esp
+	pushl %eax
+	leal 24(%esp), %eax
+	pushl %ebp
+	preempt_stop
+	GET_THREAD_INFO(%ebp)
+	movl TI_cpu(%ebp), %ebp
+	shll $GDT_SIZE_SHIFT, %ebp
+	/* find GDT of the proper CPU */
+	addl $cpu_gdt_table, %ebp
+	/* patch the base of the ring-1 16bit stack */
+	movw %ax, ((GDT_ENTRY_ESPFIX_SS << GDT_ENTRY_SHIFT) + 2)(%ebp)
+	shrl $16, %eax
+	movb %al, ((GDT_ENTRY_ESPFIX_SS << GDT_ENTRY_SHIFT) + 4)(%ebp)
+	movb %ah, ((GDT_ENTRY_ESPFIX_SS << GDT_ENTRY_SHIFT) + 7)(%ebp)
+	popl %ebp
+	/* push the ESP value to preload on ring-1 */
+	movl 28+12(%esp), %eax
+	movw $4, %ax
+	movl %eax, 4+20(%esp)
+	/* push the iret frame for our ring-1 trampoline */
+	movl $__ESPFIX_SS, 4+16(%esp)
+	movl $0, 4+12(%esp)
+	/* push ring-3 flags only to get the IOPL right */
+	movl 28+8(%esp), %eax
+	andl $IOPL_MASK, %eax
+	/* we need at least IOPL1 to re-enable interrupts */
+	orl $IOPL1_MASK, %eax
+	movl %eax, 4+8(%esp)
+	popl %eax
+	movl $__ESPFIX_CS, 4(%esp)
+	movl $espfix_trampoline, (%esp)
+	iret
 
 	# perform work that needs to be done immediately before resumption
 	ALIGN
@@ -504,7 +584,12 @@
 ENTRY(nmi)
 	cmpl $sysenter_entry,(%esp)
 	je nmi_stack_fixup
-	pushl %eax
+	cmpl $espfix_past_esp,(%esp)
+	jne 1f
+	/* restart the "popl %esp" */
+	decl (%esp)
+	subw $4, 12(%esp)
+1:	pushl %eax
 	movl %esp,%eax
 	/* Do not access memory above the end of our stack page,
 	 * it might not exist.
@@ -523,7 +608,7 @@
 	pushl %edx
 	call do_nmi
 	addl $8, %esp
-	RESTORE_ALL
+	jmp restore_all
 
 nmi_stack_fixup:
 	FIX_STACK(12,nmi_stack_correct, 1)
diff -urN linux-2.6.8-pcsp/arch/i386/kernel/head.S linux-2.6.8-stacks/arch/i386/kernel/head.S
--- linux-2.6.8-pcsp/arch/i386/kernel/head.S	2004-08-10 11:02:36.000000000 +0400
+++ linux-2.6.8-stacks/arch/i386/kernel/head.S	2004-09-24 13:57:24.000000000 +0400
@@ -514,8 +514,8 @@
 	.quad 0x00009a0000000000	/* 0xc0 APM CS 16 code (16 bit) */
 	.quad 0x0040920000000000	/* 0xc8 APM DS    data */
 
-	.quad 0x0000000000000000	/* 0xd0 - unused */
-	.quad 0x0000000000000000	/* 0xd8 - unused */
+	.quad 0x00cfba000000ffff	/* 0xd0 - ESPfix 32-bit CS */
+	.quad 0x0000b20000000017	/* 0xd8 - ESPfix 16-bit SS */
 	.quad 0x0000000000000000	/* 0xe0 - unused */
 	.quad 0x0000000000000000	/* 0xe8 - unused */
 	.quad 0x0000000000000000	/* 0xf0 - unused */
diff -urN linux-2.6.8-pcsp/arch/i386/kernel/traps.c linux-2.6.8-stacks/arch/i386/kernel/traps.c
--- linux-2.6.8-pcsp/arch/i386/kernel/traps.c	2004-08-10 11:02:36.000000000 +0400
+++ linux-2.6.8-stacks/arch/i386/kernel/traps.c	2004-09-23 17:54:52.000000000 +0400
@@ -210,7 +210,7 @@
 
 	esp = (unsigned long) (&regs->esp);
 	ss = __KERNEL_DS;
-	if (regs->xcs & 3) {
+	if (regs->xcs & 2) {
 		in_kernel = 0;
 		esp = regs->esp;
 		ss = regs->xss & 0xffff;
@@ -264,7 +264,7 @@
 	char c;
 	unsigned long eip;
 
-	if (regs->xcs & 3)
+	if (regs->xcs & 2)
 		goto no_bug;		/* Not in kernel */
 
 	eip = regs->eip;
@@ -335,7 +335,7 @@
 
 static inline void die_if_kernel(const char * str, struct pt_regs * regs, long err)
 {
-	if (!(regs->eflags & VM_MASK) && !(3 & regs->xcs))
+	if (!(regs->eflags & VM_MASK) && !(2 & regs->xcs))
 		die(str, regs, err);
 }
 
@@ -357,7 +357,7 @@
 		goto trap_signal;
 	}
 
-	if (!(regs->xcs & 3))
+	if (!(regs->xcs & 2))
 		goto kernel_trap;
 
 	trap_signal: {
@@ -437,7 +437,7 @@
 	if (regs->eflags & VM_MASK)
 		goto gp_in_vm86;
 
-	if (!(regs->xcs & 3))
+	if (!(regs->xcs & 2))
 		goto gp_in_kernel;
 
 	current->thread.error_code = error_code;
@@ -614,7 +614,7 @@
 		 * allowing programs to debug themselves without the ptrace()
 		 * interface.
 		 */
-		if ((regs->xcs & 3) == 0)
+		if ((regs->xcs & 2) == 0)
 			goto clear_TF_reenable;
 		if ((tsk->ptrace & (PT_DTRACE|PT_PTRACED)) == PT_DTRACE)
 			goto clear_TF;
@@ -630,7 +630,7 @@
 	/* If this is a kernel mode trap, save the user PC on entry to 
 	 * the kernel, that's what the debugger can make sense of.
 	 */
-	info.si_addr = ((regs->xcs & 3) == 0) ? (void *)tsk->thread.eip : 
+	info.si_addr = ((regs->xcs & 3) != 3) ? (void *)tsk->thread.eip : 
 	                                        (void *)regs->eip;
 	force_sig_info(SIGTRAP, &info, tsk);
 
diff -urN linux-2.6.8-pcsp/arch/i386/mm/extable.c linux-2.6.8-stacks/arch/i386/mm/extable.c
--- linux-2.6.8-pcsp/arch/i386/mm/extable.c	2004-06-09 15:44:19.000000000 +0400
+++ linux-2.6.8-stacks/arch/i386/mm/extable.c	2004-09-23 17:54:52.000000000 +0400
@@ -26,6 +26,11 @@
 	}
 #endif
 
+	if (unlikely(regs->xcs == __ESPFIX_CS))
+	{
+		regs->xcs = __KERNEL_CS;
+	}
+
 	fixup = search_exception_tables(regs->eip);
 	if (fixup) {
 		regs->eip = fixup->fixup;
diff -urN linux-2.6.8-pcsp/arch/i386/mm/fault.c linux-2.6.8-stacks/arch/i386/mm/fault.c
--- linux-2.6.8-pcsp/arch/i386/mm/fault.c	2004-08-10 11:02:37.000000000 +0400
+++ linux-2.6.8-stacks/arch/i386/mm/fault.c	2004-09-23 17:54:52.000000000 +0400
@@ -77,7 +77,7 @@
 	u32 seg_ar, seg_limit, base, *desc;
 
 	/* The standard kernel/user address space limit. */
-	*eip_limit = (seg & 3) ? USER_DS.seg : KERNEL_DS.seg;
+	*eip_limit = (seg & 2) ? USER_DS.seg : KERNEL_DS.seg;
 
 	/* Unlikely, but must come before segment checks. */
 	if (unlikely((regs->eflags & VM_MASK) != 0))
diff -urN linux-2.6.8-pcsp/include/asm-i386/segment.h linux-2.6.8-stacks/include/asm-i386/segment.h
--- linux-2.6.8-pcsp/include/asm-i386/segment.h	2004-01-09 09:59:19.000000000 +0300
+++ linux-2.6.8-stacks/include/asm-i386/segment.h	2004-09-23 17:54:52.000000000 +0400
@@ -38,8 +38,8 @@
  *  24 - APM BIOS support
  *  25 - APM BIOS support 
  *
- *  26 - unused
- *  27 - unused
+ *  26 - ESPfix Ring-1 trampoline CS
+ *  27 - ESPfix Ring-1 small SS
  *  28 - unused
  *  29 - unused
  *  30 - unused
@@ -71,14 +71,21 @@
 #define GDT_ENTRY_PNPBIOS_BASE		(GDT_ENTRY_KERNEL_BASE + 6)
 #define GDT_ENTRY_APMBIOS_BASE		(GDT_ENTRY_KERNEL_BASE + 11)
 
+#define GDT_ENTRY_ESPFIX_CS		(GDT_ENTRY_KERNEL_BASE + 14)
+#define GDT_ENTRY_ESPFIX_SS		(GDT_ENTRY_KERNEL_BASE + 15)
+#define __ESPFIX_CS (GDT_ENTRY_ESPFIX_CS * 8 + 1)
+#define __ESPFIX_SS (GDT_ENTRY_ESPFIX_SS * 8 + 1)
+
 #define GDT_ENTRY_DOUBLEFAULT_TSS	31
 
 /*
  * The GDT has 32 entries
  */
-#define GDT_ENTRIES 32
-
-#define GDT_SIZE (GDT_ENTRIES * 8)
+#define GDT_ENTRIES_SHIFT 5
+#define GDT_ENTRIES (1 << GDT_ENTRIES_SHIFT)
+#define GDT_ENTRY_SHIFT 3
+#define GDT_SIZE_SHIFT (GDT_ENTRIES_SHIFT + GDT_ENTRY_SHIFT)
+#define GDT_SIZE (1 << GDT_SIZE_SHIFT)
 
 /* Simple and small GDT entries for booting only */
 

  reply	other threads:[~2004-09-24 20:37 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-16 18:39 ESP corruption bug - what CPUs are affected? Petr Vandrovec
2004-09-17 18:12 ` Stas Sergeev
2004-09-18 16:45 ` Stas Sergeev
2004-09-18 16:59   ` Petr Vandrovec
2004-09-18 19:14     ` Stas Sergeev
2004-09-18 20:35       ` Petr Vandrovec
2004-09-22 18:49         ` Stas Sergeev
2004-09-22 19:19           ` Richard B. Johnson
2004-09-22 20:03             ` Stas Sergeev
2004-09-22 20:13               ` Richard B. Johnson
2004-09-28 15:43                 ` Denis Vlasenko
2004-09-22 20:02           ` Petr Vandrovec
2004-09-23  4:09             ` Stas Sergeev
2004-09-23 17:08             ` Stas Sergeev
2004-09-23 18:06               ` Petr Vandrovec
2004-09-24 20:36                 ` Stas Sergeev [this message]
2004-09-24 21:43                   ` Petr Vandrovec
2004-09-25  8:04                     ` Gabriel Paubert
2004-09-25 12:25                       ` Stas Sergeev
2004-09-25 19:18                         ` Gabriel Paubert
2004-09-25 20:40                           ` Stas Sergeev
2004-09-25 23:42                             ` Gabriel Paubert
2004-09-26 18:04                               ` Stas Sergeev
2004-09-27  9:07                                 ` Gabriel Paubert
2004-09-30 15:11                               ` Bill Davidsen
2004-10-06 16:18                     ` ESP corruption bug - what CPUs are affected? (patch attempts) Stas Sergeev
  -- strict thread matches above, loose matches on Subject: below --
2004-10-06 17:18 ESP corruption bug - what CPUs are affected? (patch att Petr Vandrovec
2004-10-11 18:32 ` ESP corruption bug - what CPUs are affected? Stas Sergeev
2004-09-16 17:49 Stas Sergeev
2004-09-16 19:03 ` Denis Vlasenko
2004-09-17 18:13   ` Stas Sergeev
2004-09-17 22:04     ` Denis Vlasenko
2004-09-18 10:58       ` Stas Sergeev
2004-09-18 13:08         ` Denis Vlasenko
2004-09-18 17:05           ` Stas Sergeev
     [not found]             ` <200409190108.45641.vda@port.imtp.ilyichevsk.odessa.ua>
2004-09-22 19:05               ` Stas Sergeev
2004-09-21 11:19         ` Pavel Machek
2004-09-21 11:43           ` Denis Vlasenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4154853F.6070105@aknet.ru \
    --to=stsp@aknet.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vandrove@vc.cvut.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).