Re: [PATCH] 3/5 explicit-iopl

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] 3/5 explicit-iopl
       [not found] <200508040043.j740hi0R004184@zach-dev.vmware.com.suse.lists.linux.kernel>
@ 2005-08-04 12:19 ` Andi Kleen
  2005-08-04 15:29   ` Zachary Amsden
  0 siblings, 1 reply; 4+ messages in thread
From: Andi Kleen @ 2005-08-04 12:19 UTC (permalink / raw)
  To: zach; +Cc: linux-kernel

zach@vmware.com writes:
> 
> Unfortunately, this added one field to the thread_struct.  But as a bonus, on
> P4, the fastest time measured for switch_to() went from 312 to 260 cycles, a
> win of about 17% in the fast case through this performance critical path.

Cool! Definitely want this on x86-64 too.

Can we perhaps get rid of the PUSHF/POPF in the SYSENTER syscall path too?
iirc they were only for single stepping. But SYSENTER doesn't disable
single stepping, so the debug handler could detect this and set
some magic flag that restores it on syscall exit.

-Andi

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] 3/5 explicit-iopl
  2005-08-04 12:19 ` [PATCH] 3/5 explicit-iopl Andi Kleen
@ 2005-08-04 15:29   ` Zachary Amsden
  2005-08-04 15:46     ` Andi Kleen
  0 siblings, 1 reply; 4+ messages in thread
From: Zachary Amsden @ 2005-08-04 15:29 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

Andi Kleen wrote:

>zach@vmware.com writes:
>  
>
>>Unfortunately, this added one field to the thread_struct.  But as a bonus, on
>>P4, the fastest time measured for switch_to() went from 312 to 260 cycles, a
>>win of about 17% in the fast case through this performance critical path.
>>    
>>
>
>Cool! Definitely want this on x86-64 too.
>  
>

Well... maybe.  On Opteron and/or Intel EMT it may not be a win.  The 
cost of the branch could overtake the cost of the POPF (that's the 
expensive one).  Grrr.

>Can we perhaps get rid of the PUSHF/POPF in the SYSENTER syscall path too?
>iirc they were only for single stepping. But SYSENTER doesn't disable
>single stepping, so the debug handler could detect this and set
>some magic flag that restores it on syscall exit.
>  
>

A context switch requires IRET, which requires the flags to be saved, so 
you can't eliminate the pushf (*) IIRC, the popf is already omitted.  
Many of these patches may be beneficial to x86-64, but. unfortunately 
the performance deltas may not translate.  Lets hope they do!  
Unfortunately, that requires re-measuring the cost of switch_to(), which 
was quite amusing to do.  I can send you diffs if you're interested, but 
using printk around this path turned out to be a really bad idea ;)  I 
really would like to bring some of the cleanup and performance work I've 
done on i386 over to x86_64 as well, but that is still probably a couple 
of weeks out.  If you can't wait, you're welcome to port pieces you 
like!  Let me know.

(*) Well, you could.  It's just that system calls would have to clobber 
flags - hmm.. sysenter based calls already do. But I'm not 100% sure 
there isn't some bogon case where kernel preemption could cause you a 
problem.  Keeping around the fake IRET frame still appears to be a good 
thing to do just for the benefit of ptrace / debug functionality.  PUSHF 
is cheap on every core I have measured on.

Zach

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] 3/5 explicit-iopl
  2005-08-04 15:29   ` Zachary Amsden
@ 2005-08-04 15:46     ` Andi Kleen
  0 siblings, 0 replies; 4+ messages in thread
From: Andi Kleen @ 2005-08-04 15:46 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Andi Kleen, linux-kernel

> Well... maybe.  On Opteron and/or Intel EMT it may not be a win.  The 
> cost of the branch could overtake the cost of the POPF (that's the 
> expensive one).  Grrr.

Not on Opteron, but probably on Intel.

iirc popf will actually flush parts of the trace cache, while
a branch shouldn't do that. So I expect it to be a win.


> 
> >Can we perhaps get rid of the PUSHF/POPF in the SYSENTER syscall path too?
> >iirc they were only for single stepping. But SYSENTER doesn't disable
> >single stepping, so the debug handler could detect this and set
> >some magic flag that restores it on syscall exit.
> > 
> >
> 
> A context switch requires IRET, which requires the flags to be saved, so 
> you can't eliminate the pushf (*) IIRC, the popf is already omitted.  

If we have iopl and TF elsewhere then the flags could just be replaced
with a default value. 

Drawback would be that it would be slightly incompatible to before,
but then it's just a function call and function calls are not 
required to preserve flags.

> (*) Well, you could.  It's just that system calls would have to clobber 
> flags - hmm.. sysenter based calls already do. But I'm not 100% sure 

They do pushf. 

> there isn't some bogon case where kernel preemption could cause you a 
> problem.  Keeping around the fake IRET frame still appears to be a good 
> thing to do just for the benefit of ptrace / debug functionality.  PUSHF 
> is cheap on every core I have measured on.

Are you sure? I thought it was expensive on Northwood at least.
Haven't measured on a Prescott or newer.

-Andi

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] 3/5 explicit-iopl
@ 2005-08-04  0:43 zach
  0 siblings, 0 replies; 4+ messages in thread
From: zach @ 2005-08-04  0:43 UTC (permalink / raw)
  To: akpm, chrisl, davej, hpa, linux-kernel, pratap, Riley, zach

The pushf/popf in switch_to are ONLY used to switch IOPL.  Making this
explicit in C code is more clear.  This pushf/popf pair was added as a
bugfix for leaking IOPL to unprivileged processes when using sysenter/sysexit
based system calls (sysexit does not restore flags).

When requesting an IOPL change in sys_iopl(), it is just as easy to
change the current flags and the flags in the stack image (in case an IRET
is required), but there is no reason to force an IRET if we came in from the
SYSENTER path.

This change is the minimal solution for supporting a paravirtualized Linux
kernel that allows user processes to run with I/O privilege.  Other solutions
require radical rewrites of part of the low level fault / system call handling
code, or do not fully support sysenter based system calls.

Unfortunately, this added one field to the thread_struct.  But as a bonus, on
P4, the fastest time measured for switch_to() went from 312 to 260 cycles, a
win of about 17% in the fast case through this performance critical path.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Index: linux-2.6.13/arch/i386/kernel/ioport.c
===================================================================
--- linux-2.6.13.orig/arch/i386/kernel/ioport.c	2005-06-17 12:48:29.000000000 -0700
+++ linux-2.6.13/arch/i386/kernel/ioport.c	2005-08-02 17:11:01.000000000 -0700
@@ -132,6 +132,7 @@
 	volatile struct pt_regs * regs = (struct pt_regs *) &unused;
 	unsigned int level = regs->ebx;
 	unsigned int old = (regs->eflags >> 12) & 3;
+	struct thread_struct * t = &current->thread;
 
 	if (level > 3)
 		return -EINVAL;
@@ -140,8 +141,8 @@
 		if (!capable(CAP_SYS_RAWIO))
 			return -EPERM;
 	}
-	regs->eflags = (regs->eflags &~ 0x3000UL) | (level << 12);
-	/* Make sure we return the long way (not sysenter) */
-	set_thread_flag(TIF_IRET);
+	t->iopl = level << 12;
+	regs->eflags = (regs->eflags & ~X86_EFLAGS_IOPL) | t->iopl;
+	set_iopl_mask(t->iopl);
 	return 0;
 }
Index: linux-2.6.13/arch/i386/kernel/process.c
===================================================================
--- linux-2.6.13.orig/arch/i386/kernel/process.c	2005-08-02 17:08:58.000000000 -0700
+++ linux-2.6.13/arch/i386/kernel/process.c	2005-08-02 17:11:01.000000000 -0700
@@ -716,6 +716,13 @@
 		loadsegment(gs, next->gs);
 
 	/*
+	 * Restore IOPL if needed.
+	 */
+	if (unlikely(prev->iopl != next->iopl)) {
+		set_iopl_mask(next->iopl);
+	}
+
+	/*
 	 * Now maybe reload the debug registers
 	 */
 	if (unlikely(next->debugreg[7])) {
Index: linux-2.6.13/include/asm-i386/processor.h
===================================================================
--- linux-2.6.13.orig/include/asm-i386/processor.h	2005-08-02 17:06:23.000000000 -0700
+++ linux-2.6.13/include/asm-i386/processor.h	2005-08-02 17:11:27.000000000 -0700
@@ -457,6 +457,7 @@
 	unsigned long	*io_bitmap_ptr;
 /* max allowed port in the bitmap, in bytes: */
 	unsigned long	io_bitmap_max;
+ 	unsigned long	iopl;
 /* performance counters */
 	struct vperfctr *perfctr;
 };
@@ -514,6 +515,21 @@
 			: /* no output */			\
 			:"r" (value))
 
+/*
+ * Set IOPL bits in EFLAGS from given mask
+ */
+static inline void set_iopl_mask(unsigned mask)
+{
+	unsigned int reg;
+	__asm__ __volatile__ ("pushfl;"
+			      "popl %0;"
+			      "andl %1, %0;"
+			      "orl %2, %0;"
+			      "pushl %0;"
+			      "popfl"
+				: "=&r" (reg)
+				: "i" (~X86_EFLAGS_IOPL), "r" (mask));
+}
 
 /* Forward declaration, a strange C thing */
 struct task_struct;
Index: linux-2.6.13/include/asm-i386/system.h
===================================================================
--- linux-2.6.13.orig/include/asm-i386/system.h	2005-08-02 17:06:23.000000000 -0700
+++ linux-2.6.13/include/asm-i386/system.h	2005-08-02 17:11:49.000000000 -0700
@@ -15,8 +15,7 @@
 #define switch_to(prev,next,last) do {					\
 	unsigned long esi,edi;						\
  	perfctr_suspend_thread(&(prev)->thread);			\
-	asm volatile("pushfl\n\t"					\
-		     "pushl %%ebp\n\t"					\
+	asm volatile("pushl %%ebp\n\t"					\
 		     "movl %%esp,%0\n\t"	/* save ESP */		\
 		     "movl %5,%%esp\n\t"	/* restore ESP */	\
 		     "movl $1f,%1\n\t"		/* save EIP */		\
@@ -24,7 +23,6 @@
 		     "jmp __switch_to\n"				\
 		     "1:\t"						\
 		     "popl %%ebp\n\t"					\
-		     "popfl"						\
 		     :"=m" (prev->thread.esp),"=m" (prev->thread.eip),	\
 		      "=a" (last),"=S" (esi),"=D" (edi)			\
 		     :"m" (next->thread.esp),"m" (next->thread.eip),	\

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-08-04 15:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200508040043.j740hi0R004184@zach-dev.vmware.com.suse.lists.linux.kernel>
2005-08-04 12:19 ` [PATCH] 3/5 explicit-iopl Andi Kleen
2005-08-04 15:29   ` Zachary Amsden
2005-08-04 15:46     ` Andi Kleen
2005-08-04  0:43 zach

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.