public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] lazy TSS's I/O bitmap copy ...
@ 2004-08-23 21:23 Davide Libenzi
  2004-08-23 21:32 ` Andi Kleen
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Davide Libenzi @ 2004-08-23 21:23 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Andy Kleen, Andrew Morton, Linus Torvalds


The following patch implements a lazy I/O bitmap copy for the i386 
architecture. With I/O bitmaps now reaching considerable sizes, if the 
switched task does not perform any I/O operation, we can save the copy 
altogether. In my box X is working fine with the following patch, even if 
more test would be required.


Signed-off-by: Davide Libenzi <davidel@xmailserver.org>


- Davide



arch/i386/kernel/process.c   |   33 +++++++++++----------------------
arch/i386/kernel/traps.c     |   20 ++++++++++++++++++++
include/asm-i386/processor.h |    1 +
3 files changed, 32 insertions(+), 22 deletions(-)



Index: arch/i386/kernel/process.c
===================================================================
RCS file: /usr/src/bkcvs/linux-2.5/arch/i386/kernel/process.c,v
retrieving revision 1.82
diff -u -r1.82 process.c
--- a/arch/i386/kernel/process.c	14 Jul 2004 16:27:05 -0000	1.82
+++ b/arch/i386/kernel/process.c	23 Aug 2004 19:33:12 -0000
@@ -551,28 +551,17 @@
 		loaddebug(next, 7);
 	}
 
-	if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr)) {
-		if (next->io_bitmap_ptr) {
-			/*
-			 * 4 cachelines copy ... not good, but not that
-			 * bad either. Anyone got something better?
-			 * This only affects processes which use ioperm().
-			 * [Putting the TSSs into 4k-tlb mapped regions
-			 * and playing VM tricks to switch the IO bitmap
-			 * is not really acceptable.]
-			 */
-			memcpy(tss->io_bitmap, next->io_bitmap_ptr,
-				IO_BITMAP_BYTES);
-			tss->io_bitmap_base = IO_BITMAP_OFFSET;
-		} else
-			/*
-			 * a bitmap offset pointing outside of the TSS limit
-			 * causes a nicely controllable SIGSEGV if a process
-			 * tries to use a port IO instruction. The first
-			 * sys_ioperm() call sets up the bitmap properly.
-			 */
-			tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
-	}
+	/*
+	 * Lazy TSS's I/O bitmap copy. We set an invalid offset here and
+	 * we let the task to get a GPF in case an I/O instruction is performed.
+	 * The handler of the GPF will verify that the faulting task has a valid
+	 * I/O bitmap and, it true, does the real copy and restart the instruction.
+	 * This will save us for redoundant copies when the currently switched task
+	 * does not perform any I/O during its timeslice.
+	 */
+	tss->io_bitmap_base = next->io_bitmap_ptr ? INVALID_IO_BITMAP_OFFSET_LAZY:
+		INVALID_IO_BITMAP_OFFSET;
+
 	return prev_p;
 }
 
Index: arch/i386/kernel/traps.c
===================================================================
RCS file: /usr/src/bkcvs/linux-2.5/arch/i386/kernel/traps.c,v
retrieving revision 1.77
diff -u -r1.77 traps.c
--- a/arch/i386/kernel/traps.c	13 Jul 2004 18:02:33 -0000	1.77
+++ b/arch/i386/kernel/traps.c	23 Aug 2004 19:34:52 -0000
@@ -431,6 +431,26 @@
 
 asmlinkage void do_general_protection(struct pt_regs * regs, long error_code)
 {
+	int cpu = smp_processor_id();
+	struct tss_struct *tss = init_tss + cpu;
+	struct task_struct *tsk = current;
+	struct thread_struct *tsk_th = &tsk->thread;
+
+	/*
+	 * Perform the lazy TSS's I/O bitmap copy. If the TSS has an
+	 * invalid offset set (the LAZY one) and the faulting thread has
+	 * a valid I/O bitmap pointer, we copy the I/O bitmap in the TSS
+	 * and we set the offset field correctly. Then we let the CPU to
+	 * restart the faulting instruction.
+	 */
+	if (tss->io_bitmap_base == INVALID_IO_BITMAP_OFFSET_LAZY &&
+	    tsk_th->io_bitmap_ptr) {
+		memcpy(tss->io_bitmap, tsk_th->io_bitmap_ptr,
+		       IO_BITMAP_BYTES);
+		tss->io_bitmap_base = IO_BITMAP_OFFSET;
+		return;
+	}
+
 	if (regs->eflags & X86_EFLAGS_IF)
 		local_irq_enable();
  
Index: include/asm-i386/processor.h
===================================================================
RCS file: /usr/src/bkcvs/linux-2.5/include/asm-i386/processor.h,v
retrieving revision 1.68
diff -u -r1.68 processor.h
--- a/include/asm-i386/processor.h	27 Jun 2004 17:53:48 -0000	1.68
+++ b/include/asm-i386/processor.h	23 Aug 2004 19:32:58 -0000
@@ -304,6 +304,7 @@
 #define IO_BITMAP_LONGS (IO_BITMAP_BYTES/sizeof(long))
 #define IO_BITMAP_OFFSET offsetof(struct tss_struct,io_bitmap)
 #define INVALID_IO_BITMAP_OFFSET 0x8000
+#define INVALID_IO_BITMAP_OFFSET_LAZY 0x9000
 
 struct i387_fsave_struct {
 	long	cwd;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] lazy TSS's I/O bitmap copy ...
  2004-08-23 21:23 [patch] lazy TSS's I/O bitmap copy Davide Libenzi
@ 2004-08-23 21:32 ` Andi Kleen
  2004-08-23 21:39   ` Davide Libenzi
  2004-08-24  1:53 ` [patch] lazy TSS's I/O bitmap copy Brian Gerst
  2004-08-24  6:51 ` Arjan van de Ven
  2 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2004-08-23 21:32 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: linux-kernel, akpm, torvalds

On Mon, 23 Aug 2004 14:23:35 -0700 (PDT)
Davide Libenzi <davidel@xmailserver.org> wrote:

> 
> The following patch implements a lazy I/O bitmap copy for the i386 
> architecture. With I/O bitmaps now reaching considerable sizes, if the 
> switched task does not perform any I/O operation, we can save the copy 
> altogether. In my box X is working fine with the following patch, even if 
> more test would be required.

IMHO this needs benchmarks first to prove that the additional 
exception doesn't cause too much slow down.

>  asmlinkage void do_general_protection(struct pt_regs * regs, long error_code)
>  {
> +	int cpu = smp_processor_id();
> +	struct tss_struct *tss = init_tss + cpu;
> +	struct task_struct *tsk = current;
> +	struct thread_struct *tsk_th = &tsk->thread;
> +
> +	/*
> +	 * Perform the lazy TSS's I/O bitmap copy. If the TSS has an
> +	 * invalid offset set (the LAZY one) and the faulting thread has
> +	 * a valid I/O bitmap pointer, we copy the I/O bitmap in the TSS
> +	 * and we set the offset field correctly. Then we let the CPU to
> +	 * restart the faulting instruction.
> +	 */

I don't like it very much that most GPFs will be executed twice now
when the process has ioperm enabled.
This will confuse debuggers and could have other bad side effects.
Checking the EIP would be better.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] lazy TSS's I/O bitmap copy ...
  2004-08-23 21:32 ` Andi Kleen
@ 2004-08-23 21:39   ` Davide Libenzi
  2004-08-23 22:07     ` Linus Torvalds
  2004-08-24  7:19     ` [patch] ioport-cache-2.6.8.1.patch Ingo Molnar
  0 siblings, 2 replies; 19+ messages in thread
From: Davide Libenzi @ 2004-08-23 21:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Linux Kernel Mailing List, Andrew Morton, Linus Torvalds

On Mon, 23 Aug 2004, Andi Kleen wrote:

> On Mon, 23 Aug 2004 14:23:35 -0700 (PDT)
> Davide Libenzi <davidel@xmailserver.org> wrote:
> 
> > 
> > The following patch implements a lazy I/O bitmap copy for the i386 
> > architecture. With I/O bitmaps now reaching considerable sizes, if the 
> > switched task does not perform any I/O operation, we can save the copy 
> > altogether. In my box X is working fine with the following patch, even if 
> > more test would be required.
> 
> IMHO this needs benchmarks first to prove that the additional 
> exception doesn't cause too much slow down.

Yes, of course.


> >  asmlinkage void do_general_protection(struct pt_regs * regs, long error_code)
> >  {
> > +	int cpu = smp_processor_id();
> > +	struct tss_struct *tss = init_tss + cpu;
> > +	struct task_struct *tsk = current;
> > +	struct thread_struct *tsk_th = &tsk->thread;
> > +
> > +	/*
> > +	 * Perform the lazy TSS's I/O bitmap copy. If the TSS has an
> > +	 * invalid offset set (the LAZY one) and the faulting thread has
> > +	 * a valid I/O bitmap pointer, we copy the I/O bitmap in the TSS
> > +	 * and we set the offset field correctly. Then we let the CPU to
> > +	 * restart the faulting instruction.
> > +	 */
> 
> I don't like it very much that most GPFs will be executed twice now
> when the process has ioperm enabled.
> This will confuse debuggers and could have other bad side effects.
> Checking the EIP would be better.

The eventually double GPF would happen only on TSS-IObmp-lazy tasks, ie 
tasks using the I/O bitmap. The check for the I/O opcode can certainly be 
done though, even if it'd make the code a little bit more complex.



- Davide


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] lazy TSS's I/O bitmap copy ...
  2004-08-23 21:39   ` Davide Libenzi
@ 2004-08-23 22:07     ` Linus Torvalds
  2004-08-23 22:18       ` Davide Libenzi
  2004-08-23 22:54       ` Davide Libenzi
  2004-08-24  7:19     ` [patch] ioport-cache-2.6.8.1.patch Ingo Molnar
  1 sibling, 2 replies; 19+ messages in thread
From: Linus Torvalds @ 2004-08-23 22:07 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Andi Kleen, Linux Kernel Mailing List, Andrew Morton



On Mon, 23 Aug 2004, Davide Libenzi wrote:
> 
> The eventually double GPF would happen only on TSS-IObmp-lazy tasks, ie 
> tasks using the I/O bitmap.

You could also check for the error code (at least the low 16 bits) being 
0, I guess, just to cut down the noise.

>	 The check for the I/O opcode can certainly be 
> done though, even if it'd make the code a little bit more complex.

Have to be very careful there to avoid nasty security issues. And with
ins/outs, you can have various prefixes etc, so decoding is not as trivial
as it could otherwise be. Even the regular in/out can have a data size
overrides..

in/out is also commonly used from vm86 mode, so decoding it really needs
to get all of the segmentation base crap right too. Nasty nasty nasty. 

In short, I think that if we do this at all, I'd much rather just do the
simple "trap twice" thing that Davide did. It's too easy to get it wrong
otherwise.

Or we should make this careful decoder some generic x86 function. We're
doing user instruction decoding in a number of places already, although I 
don't know how careful they generally need to be. Sadly, I really think 
that this one needs to be one of the most careful cases due to the vm86 
usage.

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] lazy TSS's I/O bitmap copy ...
  2004-08-23 22:07     ` Linus Torvalds
@ 2004-08-23 22:18       ` Davide Libenzi
  2004-08-23 22:27         ` Linus Torvalds
  2004-08-28 19:15         ` Alan Cox
  2004-08-23 22:54       ` Davide Libenzi
  1 sibling, 2 replies; 19+ messages in thread
From: Davide Libenzi @ 2004-08-23 22:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andi Kleen, Linux Kernel Mailing List, Andrew Morton

On Mon, 23 Aug 2004, Linus Torvalds wrote:

> On Mon, 23 Aug 2004, Davide Libenzi wrote:
> > 
> > The eventually double GPF would happen only on TSS-IObmp-lazy tasks, ie 
> > tasks using the I/O bitmap.
> 
> You could also check for the error code (at least the low 16 bits) being 
> 0, I guess, just to cut down the noise.

Yep, that's right there available.



> >	 The check for the I/O opcode can certainly be 
> > done though, even if it'd make the code a little bit more complex.
> 
> Have to be very careful there to avoid nasty security issues. And with
> ins/outs, you can have various prefixes etc, so decoding is not as trivial
> as it could otherwise be. Even the regular in/out can have a data size
> overrides..
> 
> in/out is also commonly used from vm86 mode, so decoding it really needs
> to get all of the segmentation base crap right too. Nasty nasty nasty. 
> 
> In short, I think that if we do this at all, I'd much rather just do the
> simple "trap twice" thing that Davide did. It's too easy to get it wrong
> otherwise.

IMHO since the GPF path is not a fast path like a page fault, we shouldn't 
be in bad shape with the current approach. No?



- Davide


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] lazy TSS's I/O bitmap copy ...
  2004-08-23 22:18       ` Davide Libenzi
@ 2004-08-23 22:27         ` Linus Torvalds
  2004-08-28 19:15         ` Alan Cox
  1 sibling, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2004-08-23 22:27 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Andi Kleen, Linux Kernel Mailing List, Andrew Morton



On Mon, 23 Aug 2004, Davide Libenzi wrote:
> 
> IMHO since the GPF path is not a fast path like a page fault, we shouldn't 
> be in bad shape with the current approach. No?

I agree. Mostly. It might actually be a fairly critical path for some 
things, though. GP's happen with high frequency in emulation environments, 
ie I'd expect both vmware and dosemu to have just _tons_ of them.

Of course, in those emulation environments you usually have other things 
going on too (ie signal handling costs will likely dwarf everything else). 
So who knows..

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] lazy TSS's I/O bitmap copy ...
  2004-08-23 22:07     ` Linus Torvalds
  2004-08-23 22:18       ` Davide Libenzi
@ 2004-08-23 22:54       ` Davide Libenzi
  2004-08-23 23:09         ` Linus Torvalds
  1 sibling, 1 reply; 19+ messages in thread
From: Davide Libenzi @ 2004-08-23 22:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andi Kleen, Linux Kernel Mailing List, Andrew Morton

On Mon, 23 Aug 2004, Linus Torvalds wrote:

> On Mon, 23 Aug 2004, Davide Libenzi wrote:
> > 
> > The eventually double GPF would happen only on TSS-IObmp-lazy tasks, ie 
> > tasks using the I/O bitmap.
> 
> You could also check for the error code (at least the low 16 bits) being 
> 0, I guess, just to cut down the noise.

I think, not sure though (gonna test right now), that the "Segment 
Selector Index" part of the error code might be the TSS selector index, 
that will enable an even more selective reissue.



- Davide


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] lazy TSS's I/O bitmap copy ...
  2004-08-23 22:54       ` Davide Libenzi
@ 2004-08-23 23:09         ` Linus Torvalds
  2004-08-23 23:33           ` Davide Libenzi
  0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2004-08-23 23:09 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Andi Kleen, Linux Kernel Mailing List, Andrew Morton



On Mon, 23 Aug 2004, Davide Libenzi wrote:
> 
> I think, not sure though (gonna test right now), that the "Segment 
> Selector Index" part of the error code might be the TSS selector index, 
> that will enable an even more selective reissue.

I don't think so. Generally the error code is 0 for all normal GP cases. 
The error code tends to be non-zero only for the "load segment" things, 
when it shows what the incorrect segment was.

So I bet you'll see a zero error code and that checking for it won't 
actually cut down _that_ much on the double GP faults.

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] lazy TSS's I/O bitmap copy ...
  2004-08-23 23:09         ` Linus Torvalds
@ 2004-08-23 23:33           ` Davide Libenzi
  0 siblings, 0 replies; 19+ messages in thread
From: Davide Libenzi @ 2004-08-23 23:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andi Kleen, Linux Kernel Mailing List, Andrew Morton

On Mon, 23 Aug 2004, Linus Torvalds wrote:

> On Mon, 23 Aug 2004, Davide Libenzi wrote:
> > 
> > I think, not sure though (gonna test right now), that the "Segment 
> > Selector Index" part of the error code might be the TSS selector index, 
> > that will enable an even more selective reissue.
> 
> I don't think so. Generally the error code is 0 for all normal GP cases. 
> The error code tends to be non-zero only for the "load segment" things, 
> when it shows what the incorrect segment was.

Indeed, zero is.



- Davide


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] lazy TSS's I/O bitmap copy ...
  2004-08-23 21:23 [patch] lazy TSS's I/O bitmap copy Davide Libenzi
  2004-08-23 21:32 ` Andi Kleen
@ 2004-08-24  1:53 ` Brian Gerst
  2004-08-24  2:17   ` Linus Torvalds
  2004-08-24  4:30   ` Davide Libenzi
  2004-08-24  6:51 ` Arjan van de Ven
  2 siblings, 2 replies; 19+ messages in thread
From: Brian Gerst @ 2004-08-24  1:53 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Linux Kernel Mailing List, Andy Kleen, Andrew Morton,
	Linus Torvalds

Davide Libenzi wrote:
> The following patch implements a lazy I/O bitmap copy for the i386 
> architecture. With I/O bitmaps now reaching considerable sizes, if the 
> switched task does not perform any I/O operation, we can save the copy 
> altogether. In my box X is working fine with the following patch, even if 
> more test would be required.
> 
> 
> Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
> 
> 
> - Davide
> 
> 
> 
> arch/i386/kernel/process.c   |   33 +++++++++++----------------------
> arch/i386/kernel/traps.c     |   20 ++++++++++++++++++++
> include/asm-i386/processor.h |    1 +
> 3 files changed, 32 insertions(+), 22 deletions(-)
> 
> 
> 
> Index: arch/i386/kernel/process.c
> ===================================================================
> RCS file: /usr/src/bkcvs/linux-2.5/arch/i386/kernel/process.c,v
> retrieving revision 1.82
> diff -u -r1.82 process.c
> --- a/arch/i386/kernel/process.c	14 Jul 2004 16:27:05 -0000	1.82
> +++ b/arch/i386/kernel/process.c	23 Aug 2004 19:33:12 -0000
> @@ -551,28 +551,17 @@
>  		loaddebug(next, 7);
>  	}
>  
> -	if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr)) {
> -		if (next->io_bitmap_ptr) {
> -			/*
> -			 * 4 cachelines copy ... not good, but not that
> -			 * bad either. Anyone got something better?
> -			 * This only affects processes which use ioperm().
> -			 * [Putting the TSSs into 4k-tlb mapped regions
> -			 * and playing VM tricks to switch the IO bitmap
> -			 * is not really acceptable.]
> -			 */
> -			memcpy(tss->io_bitmap, next->io_bitmap_ptr,
> -				IO_BITMAP_BYTES);
> -			tss->io_bitmap_base = IO_BITMAP_OFFSET;
> -		} else
> -			/*
> -			 * a bitmap offset pointing outside of the TSS limit
> -			 * causes a nicely controllable SIGSEGV if a process
> -			 * tries to use a port IO instruction. The first
> -			 * sys_ioperm() call sets up the bitmap properly.
> -			 */
> -			tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
> -	}
> +	/*
> +	 * Lazy TSS's I/O bitmap copy. We set an invalid offset here and
> +	 * we let the task to get a GPF in case an I/O instruction is performed.
> +	 * The handler of the GPF will verify that the faulting task has a valid
> +	 * I/O bitmap and, it true, does the real copy and restart the instruction.
> +	 * This will save us for redoundant copies when the currently switched task
> +	 * does not perform any I/O during its timeslice.
> +	 */
> +	tss->io_bitmap_base = next->io_bitmap_ptr ? INVALID_IO_BITMAP_OFFSET_LAZY:
> +		INVALID_IO_BITMAP_OFFSET;
> +
>  	return prev_p;
>  }
>  
> Index: arch/i386/kernel/traps.c
> ===================================================================
> RCS file: /usr/src/bkcvs/linux-2.5/arch/i386/kernel/traps.c,v
> retrieving revision 1.77
> diff -u -r1.77 traps.c
> --- a/arch/i386/kernel/traps.c	13 Jul 2004 18:02:33 -0000	1.77
> +++ b/arch/i386/kernel/traps.c	23 Aug 2004 19:34:52 -0000
> @@ -431,6 +431,26 @@
>  
>  asmlinkage void do_general_protection(struct pt_regs * regs, long error_code)
>  {
> +	int cpu = smp_processor_id();
> +	struct tss_struct *tss = init_tss + cpu;
> +	struct task_struct *tsk = current;
> +	struct thread_struct *tsk_th = &tsk->thread;
> +
> +	/*
> +	 * Perform the lazy TSS's I/O bitmap copy. If the TSS has an
> +	 * invalid offset set (the LAZY one) and the faulting thread has
> +	 * a valid I/O bitmap pointer, we copy the I/O bitmap in the TSS
> +	 * and we set the offset field correctly. Then we let the CPU to
> +	 * restart the faulting instruction.
> +	 */
> +	if (tss->io_bitmap_base == INVALID_IO_BITMAP_OFFSET_LAZY &&
> +	    tsk_th->io_bitmap_ptr) {
> +		memcpy(tss->io_bitmap, tsk_th->io_bitmap_ptr,
> +		       IO_BITMAP_BYTES);
> +		tss->io_bitmap_base = IO_BITMAP_OFFSET;
> +		return;
> +	}
> +
>  	if (regs->eflags & X86_EFLAGS_IF)
>  		local_irq_enable();
>   
> Index: include/asm-i386/processor.h
> ===================================================================
> RCS file: /usr/src/bkcvs/linux-2.5/include/asm-i386/processor.h,v
> retrieving revision 1.68
> diff -u -r1.68 processor.h
> --- a/include/asm-i386/processor.h	27 Jun 2004 17:53:48 -0000	1.68
> +++ b/include/asm-i386/processor.h	23 Aug 2004 19:32:58 -0000
> @@ -304,6 +304,7 @@
>  #define IO_BITMAP_LONGS (IO_BITMAP_BYTES/sizeof(long))
>  #define IO_BITMAP_OFFSET offsetof(struct tss_struct,io_bitmap)
>  #define INVALID_IO_BITMAP_OFFSET 0x8000
> +#define INVALID_IO_BITMAP_OFFSET_LAZY 0x9000
>  
>  struct i387_fsave_struct {
>  	long	cwd;
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Use get/put_cpu() in the GPF handler to prevent preemption while 
handling the TSS.

--
				Brian Gerst

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] lazy TSS's I/O bitmap copy ...
  2004-08-24  1:53 ` [patch] lazy TSS's I/O bitmap copy Brian Gerst
@ 2004-08-24  2:17   ` Linus Torvalds
  2004-08-24  4:30   ` Davide Libenzi
  1 sibling, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2004-08-24  2:17 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Davide Libenzi, Linux Kernel Mailing List, Andy Kleen,
	Andrew Morton



On Mon, 23 Aug 2004, Brian Gerst wrote:
> 
> Use get/put_cpu() in the GPF handler to prevent preemption while 
> handling the TSS.

Indeed. Good point.

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] lazy TSS's I/O bitmap copy ...
  2004-08-24  1:53 ` [patch] lazy TSS's I/O bitmap copy Brian Gerst
  2004-08-24  2:17   ` Linus Torvalds
@ 2004-08-24  4:30   ` Davide Libenzi
  1 sibling, 0 replies; 19+ messages in thread
From: Davide Libenzi @ 2004-08-24  4:30 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Linux Kernel Mailing List, Andy Kleen, Andrew Morton,
	Linus Torvalds

On Mon, 23 Aug 2004, Brian Gerst wrote:

> Use get/put_cpu() in the GPF handler to prevent preemption while 
> handling the TSS.

Will do and repost. Thx!


- Davide


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] lazy TSS's I/O bitmap copy ...
  2004-08-23 21:23 [patch] lazy TSS's I/O bitmap copy Davide Libenzi
  2004-08-23 21:32 ` Andi Kleen
  2004-08-24  1:53 ` [patch] lazy TSS's I/O bitmap copy Brian Gerst
@ 2004-08-24  6:51 ` Arjan van de Ven
  2004-08-24 15:13   ` Davide Libenzi
  2 siblings, 1 reply; 19+ messages in thread
From: Arjan van de Ven @ 2004-08-24  6:51 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Linux Kernel Mailing List, Andy Kleen, Andrew Morton,
	Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 571 bytes --]

On Mon, 2004-08-23 at 23:23, Davide Libenzi wrote:
> The following patch implements a lazy I/O bitmap copy for the i386 
> architecture. With I/O bitmaps now reaching considerable sizes, if the 
> switched task does not perform any I/O operation, we can save the copy 
> altogether. In my box X is working fine with the following patch, even if 
> more test would be required.


the thing is that X will not hit your fault path, since it runs with
iopl() called... your patch is a nice optimisation for X as a result,
however as test, X is almost worthless ;(

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch] ioport-cache-2.6.8.1.patch
  2004-08-23 21:39   ` Davide Libenzi
  2004-08-23 22:07     ` Linus Torvalds
@ 2004-08-24  7:19     ` Ingo Molnar
  2004-08-24 15:15       ` Davide Libenzi
  2004-08-24 19:38       ` Ryan Cumming
  1 sibling, 2 replies; 19+ messages in thread
From: Ingo Molnar @ 2004-08-24  7:19 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Andi Kleen, Linux Kernel Mailing List, Andrew Morton,
	Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 3043 bytes --]


(skip the discussion section for a description of the attached patch.)

* Davide Libenzi <davidel@xmailserver.org> wrote:

> > > +	/*
> > > +	 * Perform the lazy TSS's I/O bitmap copy. If the TSS has an
> > > +	 * invalid offset set (the LAZY one) and the faulting thread has
> > > +	 * a valid I/O bitmap pointer, we copy the I/O bitmap in the TSS
> > > +	 * and we set the offset field correctly. Then we let the CPU to
> > > +	 * restart the faulting instruction.
> > > +	 */
> > 
> > I don't like it very much that most GPFs will be executed twice now
> > when the process has ioperm enabled.
> > This will confuse debuggers and could have other bad side effects.
> > Checking the EIP would be better.
> 
> The eventually double GPF would happen only on TSS-IObmp-lazy tasks, ie 
> tasks using the I/O bitmap. The check for the I/O opcode can certainly be 
> done though, even if it'd make the code a little bit more complex.

another issue is that this code doesnt solve the 64K ports issue: even
with a perfect decoder ioperm() apps still see a ~80 usecs copying
latency (plus related cache trash effects) upon the first GPF - either
IO related or not. I dont think coupling this into the GPF handler is
all that good.

since 100% of Linux ioperm() apps currently use 1024 ports or less, i'd
prefer the 128 bytes (one cacheline on a P4) copy over any asynchronous
solution. (if someone wants more ports the price goes up. It should be
rare. I dont think X will ever go above 1024 ports.) We've already had
one security exploit in the lazy IO bitmap code, which further underlies
how dangerous such asynchronity is.

there's yet another danger: apps that _do_ use IO ports frequently will
see the most serious overhead via the GPF solution. They will most
likely switch to iopl(3) - which is an even less safe API than ioperm()
- so robustness suffers. So i think it's wrong policy too. Sorry :-|

but there's one additional step we can do ontop of the ports-max code to
get rid of copying in X.org's case: cache the last task that set up the
IO bitmap. This means we can set the offset to invalid and keep the IO
bitmap of that task, and switch back to a valid offset (without any
copying) when switching back to that task. (or do a copy if there is
another ioperm task we switch to.)

I've attached ioport-cache-2.6.8.1.patch that implements this. When
there's a single active ioperm() using task in the system then the
context-switch overhead is very low and constant:

 # ./ioperm-latency
 default no ioperm:             scheduling latency: 2478 cycles
 turning on port 80 ioperm:     scheduling latency: 2499 cycles
 turning on port 65535 ioperm:  scheduling latency: 2481 cycles

(updated ioperm-latency.c attached)

This single-ioperm-user situation matches 99% of the actual ioperm()
usage scenarios and gets rid of any copying whatsoever - without relying
on any fault mechanism. I can see no advantage of the GPF approach over
this patch.

the patch is against the most recent BK tree, i've tested it on x86 SMP
and UP.

	Ingo

[-- Attachment #2: ioport-cache-2.6.8.1.patch --]
[-- Type: text/plain, Size: 4601 bytes --]


cache the IO bitmap contents. If there is a single active task using ioports
then there is a very low and constant context-switch overhead from using
ioports:

 # ./ioperm-latency
 default no ioperm:             scheduling latency: 2478 cycles
 turning on port 80 ioperm:     scheduling latency: 2499 cycles
 turning on port 65535 ioperm:  scheduling latency: 2481 cycles

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/arch/i386/kernel/ioport.c.orig	
+++ linux/arch/i386/kernel/ioport.c	
@@ -56,7 +56,7 @@ static void set_bitmap(unsigned long *bi
  */
 asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int turn_on)
 {
-	unsigned int i, max_long, bytes, bytes_updated;
+	unsigned long i, max_long, bytes, bytes_updated;
 	struct thread_struct * t = &current->thread;
 	struct tss_struct * tss;
 	unsigned long *bitmap;
@@ -107,6 +107,9 @@ asmlinkage long sys_ioperm(unsigned long
 
 	/* Update the TSS: */
 	memcpy(tss->io_bitmap, t->io_bitmap_ptr, bytes_updated);
+	tss->io_bitmap_max = bytes;
+	tss->io_bitmap_owner = &current->thread;
+	tss->io_bitmap_base = IO_BITMAP_OFFSET;
 
 	put_cpu();
 
--- linux/arch/i386/kernel/process.c.orig	
+++ linux/arch/i386/kernel/process.c	
@@ -306,8 +306,11 @@ void exit_thread(void)
 		/*
 		 * Careful, clear this in the TSS too:
 		 */
-		memset(tss->io_bitmap, 0xff, t->io_bitmap_max);
+		memset(tss->io_bitmap, 0xff, tss->io_bitmap_max);
 		t->io_bitmap_max = 0;
+		tss->io_bitmap_owner = NULL;
+		tss->io_bitmap_max = 0;
+		tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
 		put_cpu();
 	}
 }
@@ -477,6 +480,38 @@ int dump_task_regs(struct task_struct *t
 	return 1;
 }
 
+static inline void
+handle_io_bitmap(struct thread_struct *next, struct tss_struct *tss)
+{
+	if (!next->io_bitmap_ptr) {
+		/*
+		 * Disable the bitmap via an invalid offset. We still cache
+		 * the previous bitmap owner and the IO bitmap contents:
+		 */
+		tss->io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
+		return;
+	}
+	if (likely(next == tss->io_bitmap_owner)) {
+		/*
+		 * Previous owner of the bitmap (hence the bitmap content)
+		 * matches the next task, we dont have to do anything but
+		 * to set a valid offset in the TSS:
+		 */
+		tss->io_bitmap_base = IO_BITMAP_OFFSET;
+		return;
+	}
+	/*
+	 * The IO bitmap in the TSS needs updating: copy the relevant
+	 * range of the new task's IO bitmap. Normally this is 128 bytes
+	 * or less:
+	 */
+	memcpy(tss->io_bitmap, next->io_bitmap_ptr,
+		max(tss->io_bitmap_max, next->io_bitmap_max));
+	tss->io_bitmap_max = next->io_bitmap_max;
+	tss->io_bitmap_owner = next;
+	tss->io_bitmap_base = IO_BITMAP_OFFSET;
+}
+
 /*
  * This special macro can be used to load a debugging register
  */
@@ -561,20 +596,8 @@ struct task_struct fastcall * __switch_t
 		loaddebug(next, 7);
 	}
 
-	if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr)) {
-		if (next->io_bitmap_ptr)
-			/*
-			 * Copy the relevant range of the IO bitmap.
-			 * Normally this is 128 bytes or less:
-			 */
-			memcpy(tss->io_bitmap, next->io_bitmap_ptr,
-				max(prev->io_bitmap_max, next->io_bitmap_max));
-		else
-			/*
-			 * Clear any possible leftover bits:
-			 */
-			memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
-	}
+	if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr))
+		handle_io_bitmap(next, tss);
 	return prev_p;
 }
 
--- linux/include/asm-i386/processor.h.orig	
+++ linux/include/asm-i386/processor.h	
@@ -358,6 +358,8 @@ typedef struct {
 	unsigned long seg;
 } mm_segment_t;
 
+struct thread_struct;
+
 struct tss_struct {
 	unsigned short	back_link,__blh;
 	unsigned long	esp0;
@@ -390,9 +392,14 @@ struct tss_struct {
 	 */
 	unsigned long	io_bitmap[IO_BITMAP_LONGS + 1];
 	/*
+	 * Cache the current maximum and the last task that used the bitmap:
+	 */
+	unsigned long io_bitmap_max;
+	struct thread_struct *io_bitmap_owner;
+	/*
 	 * pads the TSS to be cacheline-aligned (size is 0x100)
 	 */
-	unsigned long __cacheline_filler[37];
+	unsigned long __cacheline_filler[35];
 	/*
 	 * .. and then another 0x100 bytes for emergency kernel stack
 	 */
@@ -424,7 +431,7 @@ struct thread_struct {
 /* IO permissions */
 	unsigned long	*io_bitmap_ptr;
 /* max allowed port in the bitmap, in bytes: */
-	unsigned int	io_bitmap_max;
+	unsigned long	io_bitmap_max;
 };
 
 #define INIT_THREAD  {							\
@@ -444,7 +451,7 @@ struct thread_struct {
 	.ss0		= __KERNEL_DS,					\
 	.ss1		= __KERNEL_CS,					\
 	.ldt		= GDT_ENTRY_LDT,				\
-	.io_bitmap_base	= offsetof(struct tss_struct,io_bitmap),	\
+	.io_bitmap_base	= INVALID_IO_BITMAP_OFFSET,			\
 	.io_bitmap	= { [ 0 ... IO_BITMAP_LONGS] = ~0 },		\
 }
 

[-- Attachment #3: ioperm-latency.c --]
[-- Type: text/plain, Size: 1445 bytes --]

#include <errno.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <sys/io.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/unistd.h>

#define CYCLES(x) asm volatile ("rdtsc" :"=a" (x)::"edx")

#define __NR_sched_set_affinity 241
_syscall3 (int, sched_set_affinity, pid_t, pid, unsigned int, mask_len, unsigned long *, mask)

/*
 * Use a pair of RT processes bound to the same CPU to measure
 * context-switch overhead:
 */
static void measure(void)
{
	unsigned long i, min = ~0UL, mask = 1, t1, t2;

	sched_yield();
	for (i = 0; i < 100; i++) {
		asm volatile ("sti; nop; cli");
		CYCLES(t1);
		sched_yield();
		CYCLES(t2);
		if (i > 10) {
			if (t2 - t1 < min)
				min = t2 - t1;
		}
	}
	asm volatile ("sti");

	printf("scheduling latency: %ld cycles\n", min);
	sched_yield();
}

int main(void)
{
	struct sched_param p = { sched_priority: 2 };
	unsigned long mask = 1, pid;

	if (iopl(3)) {
		printf("need to run as root!\n");
		exit(-1);
	}
	sched_setscheduler(0, SCHED_FIFO, &p);
	sched_set_affinity(0, sizeof(mask), &mask);

	pid = fork();
	if (!pid)
		for (;;) {
			asm volatile ("sti; nop; cli");
			sched_yield();
		}

	printf("default no ioperm:             ");
	measure();

	printf("turning on port 80 ioperm:     ");
	ioperm(0x80,1,1);
	measure();

	printf("turning on port 65535 ioperm:  ");
	if (ioperm(0xffff,1,1))
		printf("FAILED - older kernel.\n");
	else
		measure();
	kill(pid, 9);

	return 0;
}


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] lazy TSS's I/O bitmap copy ...
  2004-08-24  6:51 ` Arjan van de Ven
@ 2004-08-24 15:13   ` Davide Libenzi
  0 siblings, 0 replies; 19+ messages in thread
From: Davide Libenzi @ 2004-08-24 15:13 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linux Kernel Mailing List, Andy Kleen, Andrew Morton,
	Linus Torvalds

On Tue, 24 Aug 2004, Arjan van de Ven wrote:

> On Mon, 2004-08-23 at 23:23, Davide Libenzi wrote:
> > The following patch implements a lazy I/O bitmap copy for the i386 
> > architecture. With I/O bitmaps now reaching considerable sizes, if the 
> > switched task does not perform any I/O operation, we can save the copy 
> > altogether. In my box X is working fine with the following patch, even if 
> > more test would be required.
> 
> the thing is that X will not hit your fault path, since it runs with
> iopl() called... your patch is a nice optimisation for X as a result,
> however as test, X is almost worthless ;(

Arjan, using FC1 here and X definitely hits the GPF path. A few crashes 
when doing The Wrong Thing confirmed it yesterday :-) Also, according to 
the snippet Alan posted, X does iopl() on fallback of failing ioperm() IIRC.



- Davide


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] ioport-cache-2.6.8.1.patch
  2004-08-24  7:19     ` [patch] ioport-cache-2.6.8.1.patch Ingo Molnar
@ 2004-08-24 15:15       ` Davide Libenzi
  2004-08-24 19:38       ` Ryan Cumming
  1 sibling, 0 replies; 19+ messages in thread
From: Davide Libenzi @ 2004-08-24 15:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Linux Kernel Mailing List, Andrew Morton,
	Linus Torvalds

On Tue, 24 Aug 2004, Ingo Molnar wrote:

> another issue is that this code doesnt solve the 64K ports issue: even
> with a perfect decoder ioperm() apps still see a ~80 usecs copying
> latency (plus related cache trash effects) upon the first GPF - either
> IO related or not. I dont think coupling this into the GPF handler is
> all that good.

So, correct me if I'm wrong, you want this price to be paid at *every* 
context switch, isn't it? Independently from the fact that the task does 
or does not I/O operations.



> since 100% of Linux ioperm() apps currently use 1024 ports or less, i'd
> prefer the 128 bytes (one cacheline on a P4) copy over any asynchronous
> solution. (if someone wants more ports the price goes up. It should be
> rare. I dont think X will ever go above 1024 ports.) We've already had
> one security exploit in the lazy IO bitmap code, which further underlies
> how dangerous such asynchronity is.

It was in the lazy FPU code ;) and this patch is utterly simple and sets 
the most restrictive policy, by later verifying in the GPF code. It does 
not leave stale bitmaps from previous tasks in search of optimizations.



> there's yet another danger: apps that _do_ use IO ports frequently will
> see the most serious overhead via the GPF solution. They will most
> likely switch to iopl(3) - which is an even less safe API than ioperm()
> - so robustness suffers. So i think it's wrong policy too. Sorry :-|

Apps that do use I/O a lot will likely issue more than one I/O per context 
switch, and the cost of even one I/O op will be greater than the GPF cost. 
This w/out even accounting the cost saved on context switches where no I/O 
is done. Talking about X for example, yesterday test (that I should better 
confirm today) revealed that more than 60% of context switches do *not* 
trigger the GPF fault, that is, for in 60+% of context switches X does not 
use I/O operations. This is not a surprise given the way the X server works.



> but there's one additional step we can do ontop of the ports-max code to
> get rid of copying in X.org's case: cache the last task that set up the
> IO bitmap. This means we can set the offset to invalid and keep the IO
> bitmap of that task, and switch back to a valid offset (without any
> copying) when switching back to that task. (or do a copy if there is
> another ioperm task we switch to.)

Personally Ingo, I do not like logics like:

if (io_apps <= 1) pretty_good(); then screwed();

But that's just a matter of personal taste. Anyway, today I'll recode with 
Brian get/put_cpu suggestion and on top of var-bitmap bits, and I'll 
repost. Then if you guys like it you take it, otherwise we keep the 
current memcpy/memset on switch_to.



- Davide


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] ioport-cache-2.6.8.1.patch
  2004-08-24  7:19     ` [patch] ioport-cache-2.6.8.1.patch Ingo Molnar
  2004-08-24 15:15       ` Davide Libenzi
@ 2004-08-24 19:38       ` Ryan Cumming
  2004-08-24 20:20         ` Ingo Molnar
  1 sibling, 1 reply; 19+ messages in thread
From: Ryan Cumming @ 2004-08-24 19:38 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 490 bytes --]

On Tuesday 24 August 2004 00:19, you wrote:
> +       if (likely(next == tss->io_bitmap_owner)) {

Probably a stupid question, but what's stopping the tss->io_bitmap_owner from 
being killed, and then a new thread_struct being kmalloc()'ed in the exact 
same place as the old one? I realize it's highly unlikely, I'm just wondering 
if it's possible at all.

I guess clearing tss->io_bitmap_owner whenever we kfree() the bitmap owner's 
thread_struct would plug that up.

-Ryan

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] ioport-cache-2.6.8.1.patch
  2004-08-24 19:38       ` Ryan Cumming
@ 2004-08-24 20:20         ` Ingo Molnar
  0 siblings, 0 replies; 19+ messages in thread
From: Ingo Molnar @ 2004-08-24 20:20 UTC (permalink / raw)
  To: Ryan Cumming; +Cc: Linux Kernel Mailing List


* Ryan Cumming <ryan@spitfire.gotdns.org> wrote:

> On Tuesday 24 August 2004 00:19, you wrote:
> > +       if (likely(next == tss->io_bitmap_owner)) {
> 
> Probably a stupid question, but what's stopping the tss->io_bitmap_owner from being killed, and then a new
> thread_struct being kmalloc()'ed in the exact same place as the old one? I realize it's highly unlikely, I'm just
> wondering if it's possible at all.
> 
> I guess clearing tss->io_bitmap_owner whenever we kfree() the bitmap owner's thread_struct would plug that up.

the patch flushes the ->io_bitmap_owner info on thread exit.

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] lazy TSS's I/O bitmap copy ...
  2004-08-23 22:18       ` Davide Libenzi
  2004-08-23 22:27         ` Linus Torvalds
@ 2004-08-28 19:15         ` Alan Cox
  1 sibling, 0 replies; 19+ messages in thread
From: Alan Cox @ 2004-08-28 19:15 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Linus Torvalds, Andi Kleen, Linux Kernel Mailing List,
	Andrew Morton

On Llu, 2004-08-23 at 23:18, Davide Libenzi wrote:
> > >	 The check for the I/O opcode can certainly be 
> > > done though, even if it'd make the code a little bit more complex.
> > 
> > Have to be very careful there to avoid nasty security issues. And with
> > ins/outs, you can have various prefixes etc, so decoding is not as trivial
> > as it could otherwise be. Even the regular in/out can have a data size
> > overrides..

And an attacker can be changing the mappings in another thread at the
same time. Now you see it, now you don't. This gets quite fun with
writing fixups to user space code but isnt a big deal for reads.

> > In short, I think that if we do this at all, I'd much rather just do the
> > simple "trap twice" thing that Davide did. It's too easy to get it wrong
> > otherwise.
> 
> IMHO since the GPF path is not a fast path like a page fault, we shouldn't 
> be in bad shape with the current approach. No?

Even the X server doesn't generally make heavy use of I/O port access on
any vaguely modern hardware. The BIOS might but its already doing some
of its own vm86 fixups for this.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2004-08-28 20:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-23 21:23 [patch] lazy TSS's I/O bitmap copy Davide Libenzi
2004-08-23 21:32 ` Andi Kleen
2004-08-23 21:39   ` Davide Libenzi
2004-08-23 22:07     ` Linus Torvalds
2004-08-23 22:18       ` Davide Libenzi
2004-08-23 22:27         ` Linus Torvalds
2004-08-28 19:15         ` Alan Cox
2004-08-23 22:54       ` Davide Libenzi
2004-08-23 23:09         ` Linus Torvalds
2004-08-23 23:33           ` Davide Libenzi
2004-08-24  7:19     ` [patch] ioport-cache-2.6.8.1.patch Ingo Molnar
2004-08-24 15:15       ` Davide Libenzi
2004-08-24 19:38       ` Ryan Cumming
2004-08-24 20:20         ` Ingo Molnar
2004-08-24  1:53 ` [patch] lazy TSS's I/O bitmap copy Brian Gerst
2004-08-24  2:17   ` Linus Torvalds
2004-08-24  4:30   ` Davide Libenzi
2004-08-24  6:51 ` Arjan van de Ven
2004-08-24 15:13   ` Davide Libenzi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox