virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] x86: Support compiling out userspace I/O (iopl and ioperm)
@ 2013-10-22  2:33 Josh Triplett
  2013-10-22  2:34 ` [PATCH 1/3] x86: process: Unify 32-bit and 64-bit copy_thread I/O bitmap handling Josh Triplett
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Josh Triplett @ 2013-10-22  2:33 UTC (permalink / raw)
  To: linux-kernel, virtualization, x86, xen-devel
  Cc: Alexander van Heukelum, Jeremy Fitzhardinge, Raghavendra K T,
	Len Brown, Frederic Weisbecker, H. Peter Anvin, Paul Gortmaker,
	David Herrmann, Masami Hiramatsu, Seiji Aguchi, Jiri Slaby,
	Alok Kataria, Jesper Nilsson, Andi Kleen, Daniel Lezcano,
	Ingo Molnar, Steven Rostedt, Borislav Petkov, Fenghua Yu,
	Kees Cook, Konrad Rzeszutek Wilk, Ross Lagerwall, Chris

This patch series makes it possible to compile out the iopl and ioperm system
calls, which allow privileged processes to request permission to directly poke
I/O ports from userspace.

Nothing on a modern Linux system uses these calls anymore, and anything new
should be using /dev/port instead, or better yet writing a driver.

Copying the bloat-o-meter stats from the final patch:

32-bit bloat-o-meter:
add/remove: 0/3 grow/shrink: 0/10 up/down: 0/-17681 (-17681)
function                                     old     new   delta
cpu_init                                     676     668      -8
ioperm_active                                 18       7     -11
init_task                                   1296    1284     -12
exit_thread                                  179      91     -88
ioperm_get                                   103      10     -93
__switch_to_xtra                             254     161     -93
sys_iopl                                     127       -    -127
SyS_iopl                                     127       -    -127
copy_thread                                  606     446    -160
vt_ioctl                                    4127    3919    -208
sys_ioperm                                   370       -    -370
init_tss                                    8576     384   -8192
doublefault_tss                             8576     384   -8192

64-bit bloat-o-meter:
add/remove: 0/4 grow/shrink: 2/9 up/down: 45/-9764 (-9719)
function                                     old     new   delta
cpu_init                                     958     995     +37
arch_align_stack                              78      86      +8
perf_event_exit_task                         525     517      -8
ioperm_active                                 17       8      -9
init_task                                   1968    1944     -24
stub_iopl                                     81       -     -81
ioperm_get                                   111      11    -100
__switch_to_xtra                             281     164    -117
exit_thread                                  212      92    -120
vt_ioctl                                    4432    4304    -128
sys_iopl                                     137       -    -137
SyS_iopl                                     137       -    -137
copy_thread                                  694     520    -174
sys_ioperm                                   473       -    -473
init_tss                                    8896     640   -8256

Josh Triplett (3):
  x86: process: Unify 32-bit and 64-bit copy_thread I/O bitmap handling
  x86: tss: Eliminate fragile calculation of TSS segment limit
  x86: Support compiling out userspace I/O (iopl and ioperm)

 arch/x86/Kconfig                      | 10 ++++
 arch/x86/include/asm/desc.h           | 11 +----
 arch/x86/include/asm/paravirt.h       |  2 +
 arch/x86/include/asm/paravirt_types.h |  2 +
 arch/x86/include/asm/processor.h      | 54 +++++++++++++++++---
 arch/x86/include/asm/syscalls.h       |  3 ++
 arch/x86/kernel/Makefile              |  3 +-
 arch/x86/kernel/cpu/common.c          | 12 +----
 arch/x86/kernel/entry_64.S            |  9 ++--
 arch/x86/kernel/paravirt.c            |  2 +
 arch/x86/kernel/process-io.h          | 93 +++++++++++++++++++++++++++++++++++
 arch/x86/kernel/process.c             | 34 ++-----------
 arch/x86/kernel/process_32.c          | 40 ++++-----------
 arch/x86/kernel/process_64.c          | 26 ++--------
 arch/x86/kernel/ptrace.c              |  8 +++
 arch/x86/xen/enlighten.c              |  4 ++
 drivers/tty/vt/vt_ioctl.c             |  2 +-
 kernel/sys_ni.c                       |  5 ++
 18 files changed, 206 insertions(+), 114 deletions(-)
 create mode 100644 arch/x86/kernel/process-io.h

-- 
1.8.4.rc3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] x86: process: Unify 32-bit and 64-bit copy_thread I/O bitmap handling
  2013-10-22  2:33 [PATCH 0/3] x86: Support compiling out userspace I/O (iopl and ioperm) Josh Triplett
@ 2013-10-22  2:34 ` Josh Triplett
  2013-10-30 22:21   ` Kees Cook
  2013-10-31 20:01   ` Alexander van Heukelum
  2013-10-22  2:34 ` [PATCH 2/3] x86: tss: Eliminate fragile calculation of TSS segment limit Josh Triplett
  2013-10-22  2:35 ` [PATCH 3/3] x86: Support compiling out userspace I/O (iopl and ioperm) Josh Triplett
  2 siblings, 2 replies; 17+ messages in thread
From: Josh Triplett @ 2013-10-22  2:34 UTC (permalink / raw)
  To: linux-kernel, virtualization, x86, xen-devel
  Cc: Alexander van Heukelum, Jeremy Fitzhardinge, Raghavendra K T,
	Len Brown, Frederic Weisbecker, H. Peter Anvin, Paul Gortmaker,
	David Herrmann, Masami Hiramatsu, Seiji Aguchi, Jiri Slaby,
	Alok Kataria, Jesper Nilsson, Andi Kleen, Daniel Lezcano,
	Ingo Molnar, Steven Rostedt, Borislav Petkov, Fenghua Yu,
	Kees Cook, Konrad Rzeszutek Wilk, Ross Lagerwall, Chris

The 32-bit and 64-bit versions of copy_thread have functionally
identical handling for copying the I/O bitmap, modulo differences in
error handling.  Clean up the error paths in both by moving the copy of
the I/O bitmap to the end, to eliminate the need to free it if
subsequent copy steps fail; move the resulting identical code to a
static inline in a common header.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
---
 arch/x86/kernel/process-io.h | 22 ++++++++++++++++++++++
 arch/x86/kernel/process_32.c | 29 ++++++++---------------------
 arch/x86/kernel/process_64.c | 26 +++++---------------------
 3 files changed, 35 insertions(+), 42 deletions(-)
 create mode 100644 arch/x86/kernel/process-io.h

diff --git a/arch/x86/kernel/process-io.h b/arch/x86/kernel/process-io.h
new file mode 100644
index 0000000..d884444
--- /dev/null
+++ b/arch/x86/kernel/process-io.h
@@ -0,0 +1,22 @@
+#ifndef _X86_KERNEL_PROCESS_IO_H
+#define _X86_KERNEL_PROCESS_IO_H
+
+static inline int copy_io_bitmap(struct task_struct *me,
+				 struct task_struct *p)
+{
+	if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
+		p->thread.io_bitmap_ptr = kmemdup(me->thread.io_bitmap_ptr,
+						  IO_BITMAP_BYTES, GFP_KERNEL);
+		if (!p->thread.io_bitmap_ptr) {
+			p->thread.io_bitmap_max = 0;
+			return -ENOMEM;
+		}
+		set_tsk_thread_flag(p, TIF_IO_BITMAP);
+	} else {
+		p->thread.io_bitmap_ptr = NULL;
+	}
+
+	return 0;
+}
+
+#endif /* _X86_KERNEL_PROCESS_IO_H */
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 884f98f..e86028b 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -56,6 +56,8 @@
 #include <asm/debugreg.h>
 #include <asm/switch_to.h>
 
+#include "process-io.h"
+
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 asmlinkage void ret_from_kernel_thread(void) __asm__("ret_from_kernel_thread");
 
@@ -135,7 +137,6 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
 {
 	struct pt_regs *childregs = task_pt_regs(p);
 	struct task_struct *tsk;
-	int err;
 
 	p->thread.sp = (unsigned long) childregs;
 	p->thread.sp0 = (unsigned long) (childregs+1);
@@ -167,36 +168,22 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
 	task_user_gs(p) = get_user_gs(current_pt_regs());
 
 	p->fpu_counter = 0;
-	p->thread.io_bitmap_ptr = NULL;
 	tsk = current;
-	err = -ENOMEM;
 
 	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
 
-	if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
-		p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr,
-						IO_BITMAP_BYTES, GFP_KERNEL);
-		if (!p->thread.io_bitmap_ptr) {
-			p->thread.io_bitmap_max = 0;
-			return -ENOMEM;
-		}
-		set_tsk_thread_flag(p, TIF_IO_BITMAP);
-	}
-
-	err = 0;
-
 	/*
 	 * Set a new TLS for the child thread?
 	 */
-	if (clone_flags & CLONE_SETTLS)
+	if (clone_flags & CLONE_SETTLS) {
+		int err;
 		err = do_set_thread_area(p, -1,
 			(struct user_desc __user *)childregs->si, 0);
-
-	if (err && p->thread.io_bitmap_ptr) {
-		kfree(p->thread.io_bitmap_ptr);
-		p->thread.io_bitmap_max = 0;
+		if(err)
+			return err;
 	}
-	return err;
+
+	return copy_io_bitmap(tsk, p);
 }
 
 void
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index bb1dc51..6e680ad 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -50,6 +50,8 @@
 #include <asm/debugreg.h>
 #include <asm/switch_to.h>
 
+#include "process-io.h"
+
 asmlinkage extern void ret_from_fork(void);
 
 asmlinkage DEFINE_PER_CPU(unsigned long, old_rsp);
@@ -154,7 +156,6 @@ static inline u32 read_32bit_tls(struct task_struct *t, int tls)
 int copy_thread(unsigned long clone_flags, unsigned long sp,
 		unsigned long arg, struct task_struct *p)
 {
-	int err;
 	struct pt_regs *childregs;
 	struct task_struct *me = current;
 
@@ -164,7 +165,6 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
 	p->thread.usersp = me->thread.usersp;
 	set_tsk_thread_flag(p, TIF_FORK);
 	p->fpu_counter = 0;
-	p->thread.io_bitmap_ptr = NULL;
 
 	savesegment(gs, p->thread.gsindex);
 	p->thread.gs = p->thread.gsindex ? 0 : me->thread.gs;
@@ -192,23 +192,13 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
 	if (sp)
 		childregs->sp = sp;
 
-	err = -ENOMEM;
 	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
 
-	if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
-		p->thread.io_bitmap_ptr = kmemdup(me->thread.io_bitmap_ptr,
-						  IO_BITMAP_BYTES, GFP_KERNEL);
-		if (!p->thread.io_bitmap_ptr) {
-			p->thread.io_bitmap_max = 0;
-			return -ENOMEM;
-		}
-		set_tsk_thread_flag(p, TIF_IO_BITMAP);
-	}
-
 	/*
 	 * Set a new TLS for the child thread?
 	 */
 	if (clone_flags & CLONE_SETTLS) {
+		int err;
 #ifdef CONFIG_IA32_EMULATION
 		if (test_thread_flag(TIF_IA32))
 			err = do_set_thread_area(p, -1,
@@ -217,16 +207,10 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
 #endif
 			err = do_arch_prctl(p, ARCH_SET_FS, childregs->r8);
 		if (err)
-			goto out;
-	}
-	err = 0;
-out:
-	if (err && p->thread.io_bitmap_ptr) {
-		kfree(p->thread.io_bitmap_ptr);
-		p->thread.io_bitmap_max = 0;
+			return err;
 	}
 
-	return err;
+	return copy_io_bitmap(me, p);
 }
 
 static void
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] x86: tss: Eliminate fragile calculation of TSS segment limit
  2013-10-22  2:33 [PATCH 0/3] x86: Support compiling out userspace I/O (iopl and ioperm) Josh Triplett
  2013-10-22  2:34 ` [PATCH 1/3] x86: process: Unify 32-bit and 64-bit copy_thread I/O bitmap handling Josh Triplett
@ 2013-10-22  2:34 ` Josh Triplett
  2013-10-30 22:22   ` Kees Cook
  2013-10-31 20:02   ` Alexander van Heukelum
  2013-10-22  2:35 ` [PATCH 3/3] x86: Support compiling out userspace I/O (iopl and ioperm) Josh Triplett
  2 siblings, 2 replies; 17+ messages in thread
From: Josh Triplett @ 2013-10-22  2:34 UTC (permalink / raw)
  To: linux-kernel, virtualization, x86, xen-devel
  Cc: Alexander van Heukelum, Jeremy Fitzhardinge, Raghavendra K T,
	Len Brown, Frederic Weisbecker, H. Peter Anvin, Paul Gortmaker,
	David Herrmann, Masami Hiramatsu, Seiji Aguchi, Jiri Slaby,
	Alok Kataria, Jesper Nilsson, Andi Kleen, Daniel Lezcano,
	Ingo Molnar, Steven Rostedt, Borislav Petkov, Fenghua Yu,
	Kees Cook, Konrad Rzeszutek Wilk, Ross Lagerwall, Chris

__set_tss_desc has a complex calculation of the TSS segment limit,
duplicating the quirky details of the I/O bitmap array length, and
requiring a complex comment to explain.  Replace that calculation with a
simpler one based on the offsetof the "stack" field that follows the
array.

That then removes the last use of IO_BITMAP_OFFSET, so delete it.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
---
 arch/x86/include/asm/desc.h      | 11 +----------
 arch/x86/include/asm/processor.h |  3 ++-
 2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index b90e5df..17ac92f 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -177,16 +177,7 @@ static inline void __set_tss_desc(unsigned cpu, unsigned int entry, void *addr)
 	struct desc_struct *d = get_cpu_gdt_table(cpu);
 	tss_desc tss;
 
-	/*
-	 * sizeof(unsigned long) coming from an extra "long" at the end
-	 * of the iobitmap. See tss_struct definition in processor.h
-	 *
-	 * -1? seg base+limit should be pointing to the address of the
-	 * last valid byte
-	 */
-	set_tssldt_descriptor(&tss, (unsigned long)addr, DESC_TSS,
-			      IO_BITMAP_OFFSET + IO_BITMAP_BYTES +
-			      sizeof(unsigned long) - 1);
+	set_tssldt_descriptor(&tss, (unsigned long)addr, DESC_TSS, TSS_LIMIT);
 	write_gdt_entry(d, entry, &tss, DESC_TSS);
 }
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 987c75e..03d3003 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -259,9 +259,10 @@ struct x86_hw_tss {
 #define IO_BITMAP_BITS			65536
 #define IO_BITMAP_BYTES			(IO_BITMAP_BITS/8)
 #define IO_BITMAP_LONGS			(IO_BITMAP_BYTES/sizeof(long))
-#define IO_BITMAP_OFFSET		offsetof(struct tss_struct, io_bitmap)
 #define INVALID_IO_BITMAP_OFFSET	0x8000
 
+#define TSS_LIMIT	(offsetof(struct tss_struct, stack) - 1)
+
 struct tss_struct {
 	/*
 	 * The hardware state:
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] x86: Support compiling out userspace I/O (iopl and ioperm)
  2013-10-22  2:33 [PATCH 0/3] x86: Support compiling out userspace I/O (iopl and ioperm) Josh Triplett
  2013-10-22  2:34 ` [PATCH 1/3] x86: process: Unify 32-bit and 64-bit copy_thread I/O bitmap handling Josh Triplett
  2013-10-22  2:34 ` [PATCH 2/3] x86: tss: Eliminate fragile calculation of TSS segment limit Josh Triplett
@ 2013-10-22  2:35 ` Josh Triplett
  2013-10-26  3:17   ` Stephen Hemminger
  2013-10-31 20:04   ` Alexander van Heukelum
  2 siblings, 2 replies; 17+ messages in thread
From: Josh Triplett @ 2013-10-22  2:35 UTC (permalink / raw)
  To: linux-kernel, virtualization, x86, xen-devel
  Cc: Alexander van Heukelum, Jeremy Fitzhardinge, Raghavendra K T,
	Len Brown, Frederic Weisbecker, H. Peter Anvin, Paul Gortmaker,
	David Herrmann, Masami Hiramatsu, Seiji Aguchi, Jiri Slaby,
	Alok Kataria, Jesper Nilsson, Andi Kleen, Daniel Lezcano,
	Ingo Molnar, Steven Rostedt, Borislav Petkov, Fenghua Yu,
	Kees Cook, Konrad Rzeszutek Wilk, Ross Lagerwall, Chris

On the vast majority of modern systems, no processes will use the
userspsace I/O syscalls, iopl and ioperm.  Add a new config option,
CONFIG_X86_IOPORT, to support configuring them out of the kernel
entirely.  Since these syscalls only exist to support rare legacy
userspace programs, X86_IOPORT does not depend on EXPERT, though it does
still default to y.

In addition to saving a significant amount of space, this also reduces
the size of several major kernel data structures, drops a bit of code
from several task-related hot paths, and reduces the attack surface of
the kernel.

All of the task-related code dealing with userspace I/O permissions now
lives in process-io.h as several new inline functions, which become
no-ops when CONFIG_X86_IOPORT.

bloat-o-meter shows a net reduction of 17681 bytes on 32-bit and 9719
bytes on 64-bit:

32-bit bloat-o-meter:
add/remove: 0/3 grow/shrink: 0/10 up/down: 0/-17681 (-17681)
function                                     old     new   delta
cpu_init                                     676     668      -8
ioperm_active                                 18       7     -11
init_task                                   1296    1284     -12
exit_thread                                  179      91     -88
ioperm_get                                   103      10     -93
__switch_to_xtra                             254     161     -93
sys_iopl                                     127       -    -127
SyS_iopl                                     127       -    -127
copy_thread                                  606     446    -160
vt_ioctl                                    4127    3919    -208
sys_ioperm                                   370       -    -370
init_tss                                    8576     384   -8192
doublefault_tss                             8576     384   -8192

64-bit bloat-o-meter:
add/remove: 0/4 grow/shrink: 2/9 up/down: 45/-9764 (-9719)
function                                     old     new   delta
cpu_init                                     958     995     +37
arch_align_stack                              78      86      +8
perf_event_exit_task                         525     517      -8
ioperm_active                                 17       8      -9
init_task                                   1968    1944     -24
stub_iopl                                     81       -     -81
ioperm_get                                   111      11    -100
__switch_to_xtra                             281     164    -117
exit_thread                                  212      92    -120
vt_ioctl                                    4432    4304    -128
sys_iopl                                     137       -    -137
SyS_iopl                                     137       -    -137
copy_thread                                  694     520    -174
sys_ioperm                                   473       -    -473
init_tss                                    8896     640   -8256

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
---
 arch/x86/Kconfig                      | 10 +++++
 arch/x86/include/asm/paravirt.h       |  2 +
 arch/x86/include/asm/paravirt_types.h |  2 +
 arch/x86/include/asm/processor.h      | 51 +++++++++++++++++++++----
 arch/x86/include/asm/syscalls.h       |  3 ++
 arch/x86/kernel/Makefile              |  3 +-
 arch/x86/kernel/cpu/common.c          | 12 +-----
 arch/x86/kernel/entry_64.S            |  9 +++--
 arch/x86/kernel/paravirt.c            |  2 +
 arch/x86/kernel/process-io.h          | 71 +++++++++++++++++++++++++++++++++++
 arch/x86/kernel/process.c             | 34 ++---------------
 arch/x86/kernel/process_32.c          | 11 +-----
 arch/x86/kernel/ptrace.c              |  8 ++++
 arch/x86/xen/enlighten.c              |  4 ++
 drivers/tty/vt/vt_ioctl.c             |  2 +-
 kernel/sys_ni.c                       |  5 +++
 16 files changed, 168 insertions(+), 61 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e241a19..d5b1e68 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -976,6 +976,16 @@ config VM86
 	  XFree86 to initialize some video cards via BIOS. Disabling this
 	  option saves about 6k.
 
+config X86_IOPORT
+	bool "iopl and ioperm system calls"
+	default y
+	---help---
+	  This option enables the iopl and ioperm system calls, which allow
+	  privileged userspace processes to directly access I/O ports. This
+	  is used by some legacy software to drive hardware directly from
+	  userspace rather than via a proper kernel driver. Unless you intend
+	  to run such software, you can safely say N here.
+
 config TOSHIBA
 	tristate "Toshiba Laptop support"
 	depends on X86_32
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 401f350..2e106d5 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -299,10 +299,12 @@ static inline void write_idt_entry(gate_desc *dt, int entry, const gate_desc *g)
 {
 	PVOP_VCALL3(pv_cpu_ops.write_idt_entry, dt, entry, g);
 }
+#ifdef CONFIG_X86_IOPORT
 static inline void set_iopl_mask(unsigned mask)
 {
 	PVOP_VCALL1(pv_cpu_ops.set_iopl_mask, mask);
 }
+#endif /* CONFIG_X86_IOPORT */
 
 /* The paravirtualized I/O functions */
 static inline void slow_down_io(void)
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index aab8f67..bc5bfeb 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -142,7 +142,9 @@ struct pv_cpu_ops {
 
 	void (*load_sp0)(struct tss_struct *tss, struct thread_struct *t);
 
+#ifdef CONFIG_X86_IOPORT
 	void (*set_iopl_mask)(unsigned mask);
+#endif /* CONFIG_X86_IOPORT */
 
 	void (*wbinvd)(void);
 	void (*io_delay)(void);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 03d3003..48f6fbe 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -256,7 +256,11 @@ struct x86_hw_tss {
 /*
  * IO-bitmap sizes:
  */
+#ifdef CONFIG_X86_IOPORT
 #define IO_BITMAP_BITS			65536
+#else
+#define IO_BITMAP_BITS			0
+#endif
 #define IO_BITMAP_BYTES			(IO_BITMAP_BITS/8)
 #define IO_BITMAP_LONGS			(IO_BITMAP_BYTES/sizeof(long))
 #define INVALID_IO_BITMAP_OFFSET	0x8000
@@ -269,6 +273,7 @@ struct tss_struct {
 	 */
 	struct x86_hw_tss	x86_tss;
 
+#ifdef CONFIG_X86_IOPORT
 	/*
 	 * The extra 1 is there because the CPU will access an
 	 * additional byte beyond the end of the IO permission
@@ -276,6 +281,7 @@ struct tss_struct {
 	 * be within the limit.
 	 */
 	unsigned long		io_bitmap[IO_BITMAP_LONGS + 1];
+#endif /* CONFIG_X86_IOPORT */
 
 	/*
 	 * .. and then another 0x100 bytes for the emergency kernel stack:
@@ -286,6 +292,24 @@ struct tss_struct {
 
 DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, init_tss);
 
+static inline void init_tss_io(struct tss_struct *t)
+{
+#ifdef CONFIG_X86_IOPORT
+	int i;
+
+	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
+
+	/*
+	 * <= is required because the CPU will access up to
+	 * 8 bits beyond the end of the IO permission bitmap.
+	 */
+	for (i = 0; i <= IO_BITMAP_LONGS; i++)
+		t->io_bitmap[i] = ~0UL;
+#else
+	t->x86_tss.io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
+#endif
+}
+
 /*
  * Save the original ist values for checking stack pointers during debugging
  */
@@ -484,13 +508,16 @@ struct thread_struct {
 	unsigned int		saved_fs;
 	unsigned int		saved_gs;
 #endif
+#ifdef CONFIG_X86_IOPORT
 	/* IO permissions: */
 	unsigned long		*io_bitmap_ptr;
 	unsigned long		iopl;
 	/* Max allowed port in the bitmap, in bytes: */
 	unsigned		io_bitmap_max;
+#endif /* CONFIG_X86_IOPORT */
 };
 
+#ifdef CONFIG_X86_IOPORT
 /*
  * Set IOPL bits in EFLAGS from given mask
  */
@@ -509,6 +536,7 @@ static inline void native_set_iopl_mask(unsigned mask)
 		      : "i" (~X86_EFLAGS_IOPL), "r" (mask));
 #endif
 }
+#endif /* CONFIG_X86_IOPORT */
 
 static inline void
 native_load_sp0(struct tss_struct *tss, struct thread_struct *thread)
@@ -828,12 +856,8 @@ static inline void spin_lock_prefetch(const void *x)
 #define STACK_TOP		TASK_SIZE
 #define STACK_TOP_MAX		STACK_TOP
 
-#define INIT_THREAD  {							  \
-	.sp0			= sizeof(init_stack) + (long)&init_stack, \
-	.vm86_info		= NULL,					  \
-	.sysenter_cs		= __KERNEL_CS,				  \
-	.io_bitmap_ptr		= NULL,					  \
-}
+#ifdef CONFIG_X86_IOPORT
+#define INIT_THREAD_IO .io_bitmap_ptr = NULL,
 
 /*
  * Note that the .io_bitmap member must be extra-big. This is because
@@ -841,6 +865,19 @@ static inline void spin_lock_prefetch(const void *x)
  * permission bitmap. The extra byte must be all 1 bits, and must
  * be within the limit.
  */
+#define INIT_TSS_IO .io_bitmap = { [0 ... IO_BITMAP_LONGS] = ~0 },
+#else
+#define INIT_THREAD_IO
+#define INIT_TSS_IO
+#endif
+
+#define INIT_THREAD  {							  \
+	.sp0			= sizeof(init_stack) + (long)&init_stack, \
+	.vm86_info		= NULL,					  \
+	.sysenter_cs		= __KERNEL_CS,				  \
+	INIT_THREAD_IO							  \
+}
+
 #define INIT_TSS  {							  \
 	.x86_tss = {							  \
 		.sp0		= sizeof(init_stack) + (long)&init_stack, \
@@ -848,7 +885,7 @@ static inline void spin_lock_prefetch(const void *x)
 		.ss1		= __KERNEL_CS,				  \
 		.io_bitmap_base	= INVALID_IO_BITMAP_OFFSET,		  \
 	 },								  \
-	.io_bitmap		= { [0 ... IO_BITMAP_LONGS] = ~0 },	  \
+	INIT_TSS_IO							  \
 }
 
 extern unsigned long thread_saved_pc(struct task_struct *tsk);
diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h
index 592a6a6..4db79ea 100644
--- a/arch/x86/include/asm/syscalls.h
+++ b/arch/x86/include/asm/syscalls.h
@@ -16,9 +16,12 @@
 #include <linux/types.h>
 
 /* Common in X86_32 and X86_64 */
+
+#ifdef CONFIG_X86_IOPORT
 /* kernel/ioport.c */
 asmlinkage long sys_ioperm(unsigned long, unsigned long, int);
 asmlinkage long sys_iopl(unsigned int);
+#endif /* CONFIG_X86_IOPORT */
 
 /* kernel/ldt.c */
 asmlinkage int sys_modify_ldt(int, void __user *, unsigned long);
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index a5408b9..9644737 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -20,7 +20,8 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace
 
 obj-y			:= process_$(BITS).o signal.o entry_$(BITS).o
 obj-y			+= traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
-obj-y			+= time.o ioport.o ldt.o dumpstack.o nmi.o
+obj-y			+= time.o ldt.o dumpstack.o nmi.o
+obj-$(CONFIG_X86_IOPORT) += ioport.o
 obj-y			+= setup.o x86_init.o i8259.o irqinit.o jump_label.o
 obj-$(CONFIG_IRQ_WORK)  += irq_work.o
 obj-y			+= probe_roms.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2793d1f..01235ef 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1223,7 +1223,6 @@ void cpu_init(void)
 	struct tss_struct *t;
 	unsigned long v;
 	int cpu;
-	int i;
 
 	/*
 	 * Load microcode on this cpu if a valid microcode is available.
@@ -1285,14 +1284,7 @@ void cpu_init(void)
 		}
 	}
 
-	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
-
-	/*
-	 * <= is required because the CPU will access up to
-	 * 8 bits beyond the end of the IO permission bitmap.
-	 */
-	for (i = 0; i <= IO_BITMAP_LONGS; i++)
-		t->io_bitmap[i] = ~0UL;
+	init_tss_io(t);
 
 	atomic_inc(&init_mm.mm_count);
 	me->active_mm = &init_mm;
@@ -1351,7 +1343,7 @@ void cpu_init(void)
 	load_TR_desc();
 	load_LDT(&init_mm.context);
 
-	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
+	init_tss_io(t);
 
 #ifdef CONFIG_DOUBLEFAULT
 	/* Set up doublefault TSS pointer in the GDT */
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 1b69951..eb4f5ba 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -844,6 +844,11 @@ ENTRY(stub_\func)
 END(stub_\func)
 	.endm
 
+	FORK_LIKE  clone
+	FORK_LIKE  fork
+	FORK_LIKE  vfork
+
+#ifdef CONFIG_X86_IOPORT
 	.macro FIXED_FRAME label,func
 ENTRY(\label)
 	CFI_STARTPROC
@@ -856,10 +861,8 @@ ENTRY(\label)
 END(\label)
 	.endm
 
-	FORK_LIKE  clone
-	FORK_LIKE  fork
-	FORK_LIKE  vfork
 	FIXED_FRAME stub_iopl, sys_iopl
+#endif /* CONFIG_X86_IOPORT */
 
 ENTRY(ptregscall_common)
 	DEFAULT_FRAME 1 8	/* offset 8: return address */
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 1b10af8..46da3d9 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -382,7 +382,9 @@ __visible struct pv_cpu_ops pv_cpu_ops = {
 	.iret = native_iret,
 	.swapgs = native_swapgs,
 
+#ifdef CONFIG_X86_IOPORT
 	.set_iopl_mask = native_set_iopl_mask,
+#endif /* CONFIG_X86_IOPORT */
 	.io_delay = native_io_delay,
 
 	.start_context_switch = paravirt_nop,
diff --git a/arch/x86/kernel/process-io.h b/arch/x86/kernel/process-io.h
index d884444..3e773fa 100644
--- a/arch/x86/kernel/process-io.h
+++ b/arch/x86/kernel/process-io.h
@@ -1,9 +1,17 @@
 #ifndef _X86_KERNEL_PROCESS_IO_H
 #define _X86_KERNEL_PROCESS_IO_H
 
+static inline void clear_thread_io_bitmap(struct task_struct *p)
+{
+#ifdef CONFIG_X86_IOPORT
+	p->thread.io_bitmap_ptr = NULL;
+#endif /* CONFIG_X86_IOPORT */
+}
+
 static inline int copy_io_bitmap(struct task_struct *me,
 				 struct task_struct *p)
 {
+#ifdef CONFIG_X86_IOPORT
 	if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
 		p->thread.io_bitmap_ptr = kmemdup(me->thread.io_bitmap_ptr,
 						  IO_BITMAP_BYTES, GFP_KERNEL);
@@ -15,8 +23,71 @@ static inline int copy_io_bitmap(struct task_struct *me,
 	} else {
 		p->thread.io_bitmap_ptr = NULL;
 	}
+#endif /* CONFIG_X86_IOPORT */
 
 	return 0;
 }
 
+static inline void exit_thread_io(struct task_struct *me)
+{
+#ifdef CONFIG_X86_IOPORT
+        struct thread_struct *t = &me->thread;
+        unsigned long *bp = t->io_bitmap_ptr;
+
+        if (bp) {
+                struct tss_struct *tss = &per_cpu(init_tss, get_cpu());
+
+                t->io_bitmap_ptr = NULL;
+                clear_thread_flag(TIF_IO_BITMAP);
+                /*
+                 * Careful, clear this in the TSS too:
+                 */
+                memset(tss->io_bitmap, 0xff, t->io_bitmap_max);
+                t->io_bitmap_max = 0;
+                put_cpu();
+                kfree(bp);
+        }
+#endif /* CONFIG_X86_IOPORT */
+}
+
+static inline void switch_iopl_mask(struct thread_struct *prev,
+				    struct thread_struct *next)
+{
+#ifdef CONFIG_X86_IOPORT
+        /*
+         * Restore IOPL if needed.  In normal use, the flags restore
+         * in the switch assembly will handle this.  But if the kernel
+         * is running virtualized at a non-zero CPL, the popf will
+         * not restore flags, so it must be done in a separate step.
+         */
+        if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))
+                set_iopl_mask(next->iopl);
+#endif /* CONFIG_X86_IOPORT */
+}
+
+static inline void switch_io_bitmap(struct tss_struct *tss,
+				    struct task_struct *prev_p,
+				    struct task_struct *next_p)
+{
+#ifdef CONFIG_X86_IOPORT
+        struct thread_struct *prev, *next;
+        prev = &prev_p->thread;
+        next = &next_p->thread;
+
+        if (test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
+                /*
+                 * Copy the relevant range of the IO bitmap.
+                 * Normally this is 128 bytes or less:
+                 */
+                memcpy(tss->io_bitmap, next->io_bitmap_ptr,
+                       max(prev->io_bitmap_max, next->io_bitmap_max));
+        } else if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)) {
+                /*
+                 * Clear any possible leftover bits:
+                 */
+                memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
+        }
+#endif /* CONFIG_X86_IOPORT */
+}
+
 #endif /* _X86_KERNEL_PROCESS_IO_H */
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c83516b..2df5b00 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -29,6 +29,8 @@
 #include <asm/debugreg.h>
 #include <asm/nmi.h>
 
+#include "process-io.h"
+
 /*
  * per-CPU TSS segments. Threads are completely 'soft' on Linux,
  * no more per-task TSS's. The TSS size is kept cacheline-aligned
@@ -101,23 +103,7 @@ void arch_task_cache_init(void)
 void exit_thread(void)
 {
 	struct task_struct *me = current;
-	struct thread_struct *t = &me->thread;
-	unsigned long *bp = t->io_bitmap_ptr;
-
-	if (bp) {
-		struct tss_struct *tss = &per_cpu(init_tss, get_cpu());
-
-		t->io_bitmap_ptr = NULL;
-		clear_thread_flag(TIF_IO_BITMAP);
-		/*
-		 * Careful, clear this in the TSS too:
-		 */
-		memset(tss->io_bitmap, 0xff, t->io_bitmap_max);
-		t->io_bitmap_max = 0;
-		put_cpu();
-		kfree(bp);
-	}
-
+	exit_thread_io(me);
 	drop_fpu(me);
 }
 
@@ -222,19 +208,7 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
 			hard_enable_TSC();
 	}
 
-	if (test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
-		/*
-		 * Copy the relevant range of the IO bitmap.
-		 * Normally this is 128 bytes or less:
-		 */
-		memcpy(tss->io_bitmap, next->io_bitmap_ptr,
-		       max(prev->io_bitmap_max, next->io_bitmap_max));
-	} else if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)) {
-		/*
-		 * Clear any possible leftover bits:
-		 */
-		memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
-	}
+	switch_io_bitmap(tss, prev_p, next_p);
 	propagate_user_return_notify(prev_p, next_p);
 }
 
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index e86028b..dc780cb 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -155,7 +155,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
 		childregs->cs = __KERNEL_CS | get_kernel_rpl();
 		childregs->flags = X86_EFLAGS_IF | X86_EFLAGS_FIXED;
 		p->fpu_counter = 0;
-		p->thread.io_bitmap_ptr = NULL;
+		clear_thread_io_bitmap(p);
 		memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
 		return 0;
 	}
@@ -269,14 +269,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 */
 	load_TLS(next, cpu);
 
-	/*
-	 * Restore IOPL if needed.  In normal use, the flags restore
-	 * in the switch assembly will handle this.  But if the kernel
-	 * is running virtualized at a non-zero CPL, the popf will
-	 * not restore flags, so it must be done in a separate step.
-	 */
-	if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))
-		set_iopl_mask(next->iopl);
+	switch_iopl_mask(prev, next);
 
 	/*
 	 * Now maybe handle debug registers and/or IO bitmaps
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 7461f50..f071bfa 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -785,7 +785,11 @@ static int ptrace_set_debugreg(struct task_struct *tsk, int n,
 static int ioperm_active(struct task_struct *target,
 			 const struct user_regset *regset)
 {
+#ifdef CONFIG_X86_IOPORT
 	return target->thread.io_bitmap_max / regset->size;
+#else
+	return 0;
+#endif
 }
 
 static int ioperm_get(struct task_struct *target,
@@ -793,12 +797,16 @@ static int ioperm_get(struct task_struct *target,
 		      unsigned int pos, unsigned int count,
 		      void *kbuf, void __user *ubuf)
 {
+#ifdef CONFIG_X86_IOPORT
 	if (!target->thread.io_bitmap_ptr)
 		return -ENXIO;
 
 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
 				   target->thread.io_bitmap_ptr,
 				   0, IO_BITMAP_BYTES);
+#else
+	return -ENXIO;
+#endif
 }
 
 /*
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index fa6ade7..f485d92 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -911,6 +911,7 @@ static void xen_load_sp0(struct tss_struct *tss,
 	xen_mc_issue(PARAVIRT_LAZY_CPU);
 }
 
+#ifdef CONFIG_X86_IOPORT
 static void xen_set_iopl_mask(unsigned mask)
 {
 	struct physdev_set_iopl set_iopl;
@@ -919,6 +920,7 @@ static void xen_set_iopl_mask(unsigned mask)
 	set_iopl.iopl = (mask == 0) ? 1 : (mask >> 12) & 3;
 	HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
 }
+#endif /* CONFIG_X86_IOPORT */
 
 static void xen_io_delay(void)
 {
@@ -1277,7 +1279,9 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = {
 	.write_idt_entry = xen_write_idt_entry,
 	.load_sp0 = xen_load_sp0,
 
+#ifdef CONFIG_X86_IOPORT
 	.set_iopl_mask = xen_set_iopl_mask,
+#endif /* CONFIG_X86_IOPORT */
 	.io_delay = xen_io_delay,
 
 	/* Xen takes care of %gs when switching to usermode for us */
diff --git a/drivers/tty/vt/vt_ioctl.c b/drivers/tty/vt/vt_ioctl.c
index 2bd78e2..7c7d405 100644
--- a/drivers/tty/vt/vt_ioctl.c
+++ b/drivers/tty/vt/vt_ioctl.c
@@ -410,7 +410,7 @@ int vt_ioctl(struct tty_struct *tty,
 		 *
 		 * XXX: you should never use these, just call ioperm directly..
 		 */
-#ifdef CONFIG_X86
+#ifdef CONFIG_X86_IOPORT
 	case KDADDIO:
 	case KDDELIO:
 		/*
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7078052..e7f4a35 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -209,3 +209,8 @@ cond_syscall(compat_sys_open_by_handle_at);
 
 /* compare kernel pointers */
 cond_syscall(sys_kcmp);
+
+/* userspace I/O port access */
+cond_syscall(stub_iopl);
+cond_syscall(sys_iopl);
+cond_syscall(sys_ioperm);
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] x86: Support compiling out userspace I/O (iopl and ioperm)
  2013-10-22  2:35 ` [PATCH 3/3] x86: Support compiling out userspace I/O (iopl and ioperm) Josh Triplett
@ 2013-10-26  3:17   ` Stephen Hemminger
  2013-10-26  4:30     ` Kees Cook
  2013-10-31 20:04   ` Alexander van Heukelum
  1 sibling, 1 reply; 17+ messages in thread
From: Stephen Hemminger @ 2013-10-26  3:17 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Alexander van Heukelum, Jeremy Fitzhardinge, Daniel Lezcano,
	Fenghua Yu, Frederic Weisbecker, H. Peter Anvin, x86,
	virtualization, Paul Gortmaker, David Herrmann, Masami Hiramatsu,
	Seiji Aguchi, Jiri Slaby, Boris Ostrovsky, Jesper Nilsson,
	Andi Kleen, Raghavendra K T, Ingo Molnar, xen-devel,
	Borislav Petkov, Len Brown, Kees Cook, Konrad Rzeszutek Wilk

I/O from userspace  is used to implement usermode virtio driver(s).
This has been done independently by Intel, Brocade/Vyatta, and 6Wind.
Sorry, it has to stay.


On Mon, Oct 21, 2013 at 7:35 PM, Josh Triplett <josh@joshtriplett.org> wrote:
> On the vast majority of modern systems, no processes will use the
> userspsace I/O syscalls, iopl and ioperm.  Add a new config option,
> CONFIG_X86_IOPORT, to support configuring them out of the kernel
> entirely.  Since these syscalls only exist to support rare legacy
> userspace programs, X86_IOPORT does not depend on EXPERT, though it does
> still default to y.
>
> In addition to saving a significant amount of space, this also reduces
> the size of several major kernel data structures, drops a bit of code
> from several task-related hot paths, and reduces the attack surface of
> the kernel.
>
> All of the task-related code dealing with userspace I/O permissions now
> lives in process-io.h as several new inline functions, which become
> no-ops when CONFIG_X86_IOPORT.
>
> bloat-o-meter shows a net reduction of 17681 bytes on 32-bit and 9719
> bytes on 64-bit:
>
> 32-bit bloat-o-meter:
> add/remove: 0/3 grow/shrink: 0/10 up/down: 0/-17681 (-17681)
> function                                     old     new   delta
> cpu_init                                     676     668      -8
> ioperm_active                                 18       7     -11
> init_task                                   1296    1284     -12
> exit_thread                                  179      91     -88
> ioperm_get                                   103      10     -93
> __switch_to_xtra                             254     161     -93
> sys_iopl                                     127       -    -127
> SyS_iopl                                     127       -    -127
> copy_thread                                  606     446    -160
> vt_ioctl                                    4127    3919    -208
> sys_ioperm                                   370       -    -370
> init_tss                                    8576     384   -8192
> doublefault_tss                             8576     384   -8192
>
> 64-bit bloat-o-meter:
> add/remove: 0/4 grow/shrink: 2/9 up/down: 45/-9764 (-9719)
> function                                     old     new   delta
> cpu_init                                     958     995     +37
> arch_align_stack                              78      86      +8
> perf_event_exit_task                         525     517      -8
> ioperm_active                                 17       8      -9
> init_task                                   1968    1944     -24
> stub_iopl                                     81       -     -81
> ioperm_get                                   111      11    -100
> __switch_to_xtra                             281     164    -117
> exit_thread                                  212      92    -120
> vt_ioctl                                    4432    4304    -128
> sys_iopl                                     137       -    -137
> SyS_iopl                                     137       -    -137
> copy_thread                                  694     520    -174
> sys_ioperm                                   473       -    -473
> init_tss                                    8896     640   -8256
>
> Signed-off-by: Josh Triplett <josh@joshtriplett.org>
> ---
>  arch/x86/Kconfig                      | 10 +++++
>  arch/x86/include/asm/paravirt.h       |  2 +
>  arch/x86/include/asm/paravirt_types.h |  2 +
>  arch/x86/include/asm/processor.h      | 51 +++++++++++++++++++++----
>  arch/x86/include/asm/syscalls.h       |  3 ++
>  arch/x86/kernel/Makefile              |  3 +-
>  arch/x86/kernel/cpu/common.c          | 12 +-----
>  arch/x86/kernel/entry_64.S            |  9 +++--
>  arch/x86/kernel/paravirt.c            |  2 +
>  arch/x86/kernel/process-io.h          | 71 +++++++++++++++++++++++++++++++++++
>  arch/x86/kernel/process.c             | 34 ++---------------
>  arch/x86/kernel/process_32.c          | 11 +-----
>  arch/x86/kernel/ptrace.c              |  8 ++++
>  arch/x86/xen/enlighten.c              |  4 ++
>  drivers/tty/vt/vt_ioctl.c             |  2 +-
>  kernel/sys_ni.c                       |  5 +++
>  16 files changed, 168 insertions(+), 61 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index e241a19..d5b1e68 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -976,6 +976,16 @@ config VM86
>           XFree86 to initialize some video cards via BIOS. Disabling this
>           option saves about 6k.
>
> +config X86_IOPORT
> +       bool "iopl and ioperm system calls"
> +       default y
> +       ---help---
> +         This option enables the iopl and ioperm system calls, which allow
> +         privileged userspace processes to directly access I/O ports. This
> +         is used by some legacy software to drive hardware directly from
> +         userspace rather than via a proper kernel driver. Unless you intend
> +         to run such software, you can safely say N here.
> +
>  config TOSHIBA
>         tristate "Toshiba Laptop support"
>         depends on X86_32
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 401f350..2e106d5 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -299,10 +299,12 @@ static inline void write_idt_entry(gate_desc *dt, int entry, const gate_desc *g)
>  {
>         PVOP_VCALL3(pv_cpu_ops.write_idt_entry, dt, entry, g);
>  }
> +#ifdef CONFIG_X86_IOPORT
>  static inline void set_iopl_mask(unsigned mask)
>  {
>         PVOP_VCALL1(pv_cpu_ops.set_iopl_mask, mask);
>  }
> +#endif /* CONFIG_X86_IOPORT */
>
>  /* The paravirtualized I/O functions */
>  static inline void slow_down_io(void)
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index aab8f67..bc5bfeb 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -142,7 +142,9 @@ struct pv_cpu_ops {
>
>         void (*load_sp0)(struct tss_struct *tss, struct thread_struct *t);
>
> +#ifdef CONFIG_X86_IOPORT
>         void (*set_iopl_mask)(unsigned mask);
> +#endif /* CONFIG_X86_IOPORT */
>
>         void (*wbinvd)(void);
>         void (*io_delay)(void);
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 03d3003..48f6fbe 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -256,7 +256,11 @@ struct x86_hw_tss {
>  /*
>   * IO-bitmap sizes:
>   */
> +#ifdef CONFIG_X86_IOPORT
>  #define IO_BITMAP_BITS                 65536
> +#else
> +#define IO_BITMAP_BITS                 0
> +#endif
>  #define IO_BITMAP_BYTES                        (IO_BITMAP_BITS/8)
>  #define IO_BITMAP_LONGS                        (IO_BITMAP_BYTES/sizeof(long))
>  #define INVALID_IO_BITMAP_OFFSET       0x8000
> @@ -269,6 +273,7 @@ struct tss_struct {
>          */
>         struct x86_hw_tss       x86_tss;
>
> +#ifdef CONFIG_X86_IOPORT
>         /*
>          * The extra 1 is there because the CPU will access an
>          * additional byte beyond the end of the IO permission
> @@ -276,6 +281,7 @@ struct tss_struct {
>          * be within the limit.
>          */
>         unsigned long           io_bitmap[IO_BITMAP_LONGS + 1];
> +#endif /* CONFIG_X86_IOPORT */
>
>         /*
>          * .. and then another 0x100 bytes for the emergency kernel stack:
> @@ -286,6 +292,24 @@ struct tss_struct {
>
>  DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, init_tss);
>
> +static inline void init_tss_io(struct tss_struct *t)
> +{
> +#ifdef CONFIG_X86_IOPORT
> +       int i;
> +
> +       t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
> +
> +       /*
> +        * <= is required because the CPU will access up to
> +        * 8 bits beyond the end of the IO permission bitmap.
> +        */
> +       for (i = 0; i <= IO_BITMAP_LONGS; i++)
> +               t->io_bitmap[i] = ~0UL;
> +#else
> +       t->x86_tss.io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
> +#endif
> +}
> +
>  /*
>   * Save the original ist values for checking stack pointers during debugging
>   */
> @@ -484,13 +508,16 @@ struct thread_struct {
>         unsigned int            saved_fs;
>         unsigned int            saved_gs;
>  #endif
> +#ifdef CONFIG_X86_IOPORT
>         /* IO permissions: */
>         unsigned long           *io_bitmap_ptr;
>         unsigned long           iopl;
>         /* Max allowed port in the bitmap, in bytes: */
>         unsigned                io_bitmap_max;
> +#endif /* CONFIG_X86_IOPORT */
>  };
>
> +#ifdef CONFIG_X86_IOPORT
>  /*
>   * Set IOPL bits in EFLAGS from given mask
>   */
> @@ -509,6 +536,7 @@ static inline void native_set_iopl_mask(unsigned mask)
>                       : "i" (~X86_EFLAGS_IOPL), "r" (mask));
>  #endif
>  }
> +#endif /* CONFIG_X86_IOPORT */
>
>  static inline void
>  native_load_sp0(struct tss_struct *tss, struct thread_struct *thread)
> @@ -828,12 +856,8 @@ static inline void spin_lock_prefetch(const void *x)
>  #define STACK_TOP              TASK_SIZE
>  #define STACK_TOP_MAX          STACK_TOP
>
> -#define INIT_THREAD  {                                                   \
> -       .sp0                    = sizeof(init_stack) + (long)&init_stack, \
> -       .vm86_info              = NULL,                                   \
> -       .sysenter_cs            = __KERNEL_CS,                            \
> -       .io_bitmap_ptr          = NULL,                                   \
> -}
> +#ifdef CONFIG_X86_IOPORT
> +#define INIT_THREAD_IO .io_bitmap_ptr = NULL,
>
>  /*
>   * Note that the .io_bitmap member must be extra-big. This is because
> @@ -841,6 +865,19 @@ static inline void spin_lock_prefetch(const void *x)
>   * permission bitmap. The extra byte must be all 1 bits, and must
>   * be within the limit.
>   */
> +#define INIT_TSS_IO .io_bitmap = { [0 ... IO_BITMAP_LONGS] = ~0 },
> +#else
> +#define INIT_THREAD_IO
> +#define INIT_TSS_IO
> +#endif
> +
> +#define INIT_THREAD  {                                                   \
> +       .sp0                    = sizeof(init_stack) + (long)&init_stack, \
> +       .vm86_info              = NULL,                                   \
> +       .sysenter_cs            = __KERNEL_CS,                            \
> +       INIT_THREAD_IO                                                    \
> +}
> +
>  #define INIT_TSS  {                                                      \
>         .x86_tss = {                                                      \
>                 .sp0            = sizeof(init_stack) + (long)&init_stack, \
> @@ -848,7 +885,7 @@ static inline void spin_lock_prefetch(const void *x)
>                 .ss1            = __KERNEL_CS,                            \
>                 .io_bitmap_base = INVALID_IO_BITMAP_OFFSET,               \
>          },                                                               \
> -       .io_bitmap              = { [0 ... IO_BITMAP_LONGS] = ~0 },       \
> +       INIT_TSS_IO                                                       \
>  }
>
>  extern unsigned long thread_saved_pc(struct task_struct *tsk);
> diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h
> index 592a6a6..4db79ea 100644
> --- a/arch/x86/include/asm/syscalls.h
> +++ b/arch/x86/include/asm/syscalls.h
> @@ -16,9 +16,12 @@
>  #include <linux/types.h>
>
>  /* Common in X86_32 and X86_64 */
> +
> +#ifdef CONFIG_X86_IOPORT
>  /* kernel/ioport.c */
>  asmlinkage long sys_ioperm(unsigned long, unsigned long, int);
>  asmlinkage long sys_iopl(unsigned int);
> +#endif /* CONFIG_X86_IOPORT */
>
>  /* kernel/ldt.c */
>  asmlinkage int sys_modify_ldt(int, void __user *, unsigned long);
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index a5408b9..9644737 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -20,7 +20,8 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace
>
>  obj-y                  := process_$(BITS).o signal.o entry_$(BITS).o
>  obj-y                  += traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
> -obj-y                  += time.o ioport.o ldt.o dumpstack.o nmi.o
> +obj-y                  += time.o ldt.o dumpstack.o nmi.o
> +obj-$(CONFIG_X86_IOPORT) += ioport.o
>  obj-y                  += setup.o x86_init.o i8259.o irqinit.o jump_label.o
>  obj-$(CONFIG_IRQ_WORK)  += irq_work.o
>  obj-y                  += probe_roms.o
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 2793d1f..01235ef 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1223,7 +1223,6 @@ void cpu_init(void)
>         struct tss_struct *t;
>         unsigned long v;
>         int cpu;
> -       int i;
>
>         /*
>          * Load microcode on this cpu if a valid microcode is available.
> @@ -1285,14 +1284,7 @@ void cpu_init(void)
>                 }
>         }
>
> -       t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
> -
> -       /*
> -        * <= is required because the CPU will access up to
> -        * 8 bits beyond the end of the IO permission bitmap.
> -        */
> -       for (i = 0; i <= IO_BITMAP_LONGS; i++)
> -               t->io_bitmap[i] = ~0UL;
> +       init_tss_io(t);
>
>         atomic_inc(&init_mm.mm_count);
>         me->active_mm = &init_mm;
> @@ -1351,7 +1343,7 @@ void cpu_init(void)
>         load_TR_desc();
>         load_LDT(&init_mm.context);
>
> -       t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
> +       init_tss_io(t);
>
>  #ifdef CONFIG_DOUBLEFAULT
>         /* Set up doublefault TSS pointer in the GDT */
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 1b69951..eb4f5ba 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -844,6 +844,11 @@ ENTRY(stub_\func)
>  END(stub_\func)
>         .endm
>
> +       FORK_LIKE  clone
> +       FORK_LIKE  fork
> +       FORK_LIKE  vfork
> +
> +#ifdef CONFIG_X86_IOPORT
>         .macro FIXED_FRAME label,func
>  ENTRY(\label)
>         CFI_STARTPROC
> @@ -856,10 +861,8 @@ ENTRY(\label)
>  END(\label)
>         .endm
>
> -       FORK_LIKE  clone
> -       FORK_LIKE  fork
> -       FORK_LIKE  vfork
>         FIXED_FRAME stub_iopl, sys_iopl
> +#endif /* CONFIG_X86_IOPORT */
>
>  ENTRY(ptregscall_common)
>         DEFAULT_FRAME 1 8       /* offset 8: return address */
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 1b10af8..46da3d9 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -382,7 +382,9 @@ __visible struct pv_cpu_ops pv_cpu_ops = {
>         .iret = native_iret,
>         .swapgs = native_swapgs,
>
> +#ifdef CONFIG_X86_IOPORT
>         .set_iopl_mask = native_set_iopl_mask,
> +#endif /* CONFIG_X86_IOPORT */
>         .io_delay = native_io_delay,
>
>         .start_context_switch = paravirt_nop,
> diff --git a/arch/x86/kernel/process-io.h b/arch/x86/kernel/process-io.h
> index d884444..3e773fa 100644
> --- a/arch/x86/kernel/process-io.h
> +++ b/arch/x86/kernel/process-io.h
> @@ -1,9 +1,17 @@
>  #ifndef _X86_KERNEL_PROCESS_IO_H
>  #define _X86_KERNEL_PROCESS_IO_H
>
> +static inline void clear_thread_io_bitmap(struct task_struct *p)
> +{
> +#ifdef CONFIG_X86_IOPORT
> +       p->thread.io_bitmap_ptr = NULL;
> +#endif /* CONFIG_X86_IOPORT */
> +}
> +
>  static inline int copy_io_bitmap(struct task_struct *me,
>                                  struct task_struct *p)
>  {
> +#ifdef CONFIG_X86_IOPORT
>         if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
>                 p->thread.io_bitmap_ptr = kmemdup(me->thread.io_bitmap_ptr,
>                                                   IO_BITMAP_BYTES, GFP_KERNEL);
> @@ -15,8 +23,71 @@ static inline int copy_io_bitmap(struct task_struct *me,
>         } else {
>                 p->thread.io_bitmap_ptr = NULL;
>         }
> +#endif /* CONFIG_X86_IOPORT */
>
>         return 0;
>  }
>
> +static inline void exit_thread_io(struct task_struct *me)
> +{
> +#ifdef CONFIG_X86_IOPORT
> +        struct thread_struct *t = &me->thread;
> +        unsigned long *bp = t->io_bitmap_ptr;
> +
> +        if (bp) {
> +                struct tss_struct *tss = &per_cpu(init_tss, get_cpu());
> +
> +                t->io_bitmap_ptr = NULL;
> +                clear_thread_flag(TIF_IO_BITMAP);
> +                /*
> +                 * Careful, clear this in the TSS too:
> +                 */
> +                memset(tss->io_bitmap, 0xff, t->io_bitmap_max);
> +                t->io_bitmap_max = 0;
> +                put_cpu();
> +                kfree(bp);
> +        }
> +#endif /* CONFIG_X86_IOPORT */
> +}
> +
> +static inline void switch_iopl_mask(struct thread_struct *prev,
> +                                   struct thread_struct *next)
> +{
> +#ifdef CONFIG_X86_IOPORT
> +        /*
> +         * Restore IOPL if needed.  In normal use, the flags restore
> +         * in the switch assembly will handle this.  But if the kernel
> +         * is running virtualized at a non-zero CPL, the popf will
> +         * not restore flags, so it must be done in a separate step.
> +         */
> +        if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))
> +                set_iopl_mask(next->iopl);
> +#endif /* CONFIG_X86_IOPORT */
> +}
> +
> +static inline void switch_io_bitmap(struct tss_struct *tss,
> +                                   struct task_struct *prev_p,
> +                                   struct task_struct *next_p)
> +{
> +#ifdef CONFIG_X86_IOPORT
> +        struct thread_struct *prev, *next;
> +        prev = &prev_p->thread;
> +        next = &next_p->thread;
> +
> +        if (test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
> +                /*
> +                 * Copy the relevant range of the IO bitmap.
> +                 * Normally this is 128 bytes or less:
> +                 */
> +                memcpy(tss->io_bitmap, next->io_bitmap_ptr,
> +                       max(prev->io_bitmap_max, next->io_bitmap_max));
> +        } else if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)) {
> +                /*
> +                 * Clear any possible leftover bits:
> +                 */
> +                memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
> +        }
> +#endif /* CONFIG_X86_IOPORT */
> +}
> +
>  #endif /* _X86_KERNEL_PROCESS_IO_H */
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index c83516b..2df5b00 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -29,6 +29,8 @@
>  #include <asm/debugreg.h>
>  #include <asm/nmi.h>
>
> +#include "process-io.h"
> +
>  /*
>   * per-CPU TSS segments. Threads are completely 'soft' on Linux,
>   * no more per-task TSS's. The TSS size is kept cacheline-aligned
> @@ -101,23 +103,7 @@ void arch_task_cache_init(void)
>  void exit_thread(void)
>  {
>         struct task_struct *me = current;
> -       struct thread_struct *t = &me->thread;
> -       unsigned long *bp = t->io_bitmap_ptr;
> -
> -       if (bp) {
> -               struct tss_struct *tss = &per_cpu(init_tss, get_cpu());
> -
> -               t->io_bitmap_ptr = NULL;
> -               clear_thread_flag(TIF_IO_BITMAP);
> -               /*
> -                * Careful, clear this in the TSS too:
> -                */
> -               memset(tss->io_bitmap, 0xff, t->io_bitmap_max);
> -               t->io_bitmap_max = 0;
> -               put_cpu();
> -               kfree(bp);
> -       }
> -
> +       exit_thread_io(me);
>         drop_fpu(me);
>  }
>
> @@ -222,19 +208,7 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
>                         hard_enable_TSC();
>         }
>
> -       if (test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
> -               /*
> -                * Copy the relevant range of the IO bitmap.
> -                * Normally this is 128 bytes or less:
> -                */
> -               memcpy(tss->io_bitmap, next->io_bitmap_ptr,
> -                      max(prev->io_bitmap_max, next->io_bitmap_max));
> -       } else if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)) {
> -               /*
> -                * Clear any possible leftover bits:
> -                */
> -               memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
> -       }
> +       switch_io_bitmap(tss, prev_p, next_p);
>         propagate_user_return_notify(prev_p, next_p);
>  }
>
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index e86028b..dc780cb 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -155,7 +155,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
>                 childregs->cs = __KERNEL_CS | get_kernel_rpl();
>                 childregs->flags = X86_EFLAGS_IF | X86_EFLAGS_FIXED;
>                 p->fpu_counter = 0;
> -               p->thread.io_bitmap_ptr = NULL;
> +               clear_thread_io_bitmap(p);
>                 memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
>                 return 0;
>         }
> @@ -269,14 +269,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>          */
>         load_TLS(next, cpu);
>
> -       /*
> -        * Restore IOPL if needed.  In normal use, the flags restore
> -        * in the switch assembly will handle this.  But if the kernel
> -        * is running virtualized at a non-zero CPL, the popf will
> -        * not restore flags, so it must be done in a separate step.
> -        */
> -       if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))
> -               set_iopl_mask(next->iopl);
> +       switch_iopl_mask(prev, next);
>
>         /*
>          * Now maybe handle debug registers and/or IO bitmaps
> diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
> index 7461f50..f071bfa 100644
> --- a/arch/x86/kernel/ptrace.c
> +++ b/arch/x86/kernel/ptrace.c
> @@ -785,7 +785,11 @@ static int ptrace_set_debugreg(struct task_struct *tsk, int n,
>  static int ioperm_active(struct task_struct *target,
>                          const struct user_regset *regset)
>  {
> +#ifdef CONFIG_X86_IOPORT
>         return target->thread.io_bitmap_max / regset->size;
> +#else
> +       return 0;
> +#endif
>  }
>
>  static int ioperm_get(struct task_struct *target,
> @@ -793,12 +797,16 @@ static int ioperm_get(struct task_struct *target,
>                       unsigned int pos, unsigned int count,
>                       void *kbuf, void __user *ubuf)
>  {
> +#ifdef CONFIG_X86_IOPORT
>         if (!target->thread.io_bitmap_ptr)
>                 return -ENXIO;
>
>         return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
>                                    target->thread.io_bitmap_ptr,
>                                    0, IO_BITMAP_BYTES);
> +#else
> +       return -ENXIO;
> +#endif
>  }
>
>  /*
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index fa6ade7..f485d92 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -911,6 +911,7 @@ static void xen_load_sp0(struct tss_struct *tss,
>         xen_mc_issue(PARAVIRT_LAZY_CPU);
>  }
>
> +#ifdef CONFIG_X86_IOPORT
>  static void xen_set_iopl_mask(unsigned mask)
>  {
>         struct physdev_set_iopl set_iopl;
> @@ -919,6 +920,7 @@ static void xen_set_iopl_mask(unsigned mask)
>         set_iopl.iopl = (mask == 0) ? 1 : (mask >> 12) & 3;
>         HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
>  }
> +#endif /* CONFIG_X86_IOPORT */
>
>  static void xen_io_delay(void)
>  {
> @@ -1277,7 +1279,9 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = {
>         .write_idt_entry = xen_write_idt_entry,
>         .load_sp0 = xen_load_sp0,
>
> +#ifdef CONFIG_X86_IOPORT
>         .set_iopl_mask = xen_set_iopl_mask,
> +#endif /* CONFIG_X86_IOPORT */
>         .io_delay = xen_io_delay,
>
>         /* Xen takes care of %gs when switching to usermode for us */
> diff --git a/drivers/tty/vt/vt_ioctl.c b/drivers/tty/vt/vt_ioctl.c
> index 2bd78e2..7c7d405 100644
> --- a/drivers/tty/vt/vt_ioctl.c
> +++ b/drivers/tty/vt/vt_ioctl.c
> @@ -410,7 +410,7 @@ int vt_ioctl(struct tty_struct *tty,
>                  *
>                  * XXX: you should never use these, just call ioperm directly..
>                  */
> -#ifdef CONFIG_X86
> +#ifdef CONFIG_X86_IOPORT
>         case KDADDIO:
>         case KDDELIO:
>                 /*
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 7078052..e7f4a35 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -209,3 +209,8 @@ cond_syscall(compat_sys_open_by_handle_at);
>
>  /* compare kernel pointers */
>  cond_syscall(sys_kcmp);
> +
> +/* userspace I/O port access */
> +cond_syscall(stub_iopl);
> +cond_syscall(sys_iopl);
> +cond_syscall(sys_ioperm);
> --
> 1.8.4.rc3
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] x86: Support compiling out userspace I/O (iopl and ioperm)
  2013-10-26  3:17   ` Stephen Hemminger
@ 2013-10-26  4:30     ` Kees Cook
  0 siblings, 0 replies; 17+ messages in thread
From: Kees Cook @ 2013-10-26  4:30 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Alexander van Heukelum, Jeremy Fitzhardinge, Daniel Lezcano,
	Fenghua Yu, Frederic Weisbecker, David Herrmann, x86@kernel.org,
	virtualization@lists.linux-foundation.org, Paul Gortmaker,
	H. Peter Anvin, Masami Hiramatsu, Seiji Aguchi, Jiri Slaby,
	Boris Ostrovsky, Jesper Nilsson, Andi Kleen, Raghavendra K T,
	Ingo Molnar, xen-devel, Borislav Petkov, Len Brown, Konrad

On Sat, Oct 26, 2013 at 4:17 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> I/O from userspace  is used to implement usermode virtio driver(s).
> This has been done independently by Intel, Brocade/Vyatta, and 6Wind.
> Sorry, it has to stay.

This isn't about removing it, it's about putting it behind a default=y
CONFIG. At the very least, the cleanups and 32/64 merging look
sensible.

As for the CONFIG, I think it'd be useful to have that, since some
more specialized users may not want IO ports at all (some folks see IO
ports as dangerous since they are a potential way to make a jump from
uid-0 to ring-0).

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] x86: process: Unify 32-bit and 64-bit copy_thread I/O bitmap handling
  2013-10-22  2:34 ` [PATCH 1/3] x86: process: Unify 32-bit and 64-bit copy_thread I/O bitmap handling Josh Triplett
@ 2013-10-30 22:21   ` Kees Cook
  2013-10-31 20:01   ` Alexander van Heukelum
  1 sibling, 0 replies; 17+ messages in thread
From: Kees Cook @ 2013-10-30 22:21 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Alexander van Heukelum, Jeremy Fitzhardinge, x86@kernel.org,
	Len Brown, Frederic Weisbecker, H. Peter Anvin,
	virtualization@lists.linux-foundation.org, Paul Gortmaker,
	Raghavendra K T, David Herrmann, Masami Hiramatsu, Seiji Aguchi,
	Jiri Slaby, Alok Kataria, Jesper Nilsson, Andi Kleen,
	Daniel Lezcano, Ingo Molnar, Steven Rostedt, xen-devel,
	Borislav Petkov, Fenghua Yu <feng>

On Mon, Oct 21, 2013 at 7:34 PM, Josh Triplett <josh@joshtriplett.org> wrote:
> The 32-bit and 64-bit versions of copy_thread have functionally
> identical handling for copying the I/O bitmap, modulo differences in
> error handling.  Clean up the error paths in both by moving the copy of
> the I/O bitmap to the end, to eliminate the need to free it if
> subsequent copy steps fail; move the resulting identical code to a
> static inline in a common header.
>
> Signed-off-by: Josh Triplett <josh@joshtriplett.org>

This seems quite sensible to me.

Acked-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] x86: tss: Eliminate fragile calculation of TSS segment limit
  2013-10-22  2:34 ` [PATCH 2/3] x86: tss: Eliminate fragile calculation of TSS segment limit Josh Triplett
@ 2013-10-30 22:22   ` Kees Cook
  2013-10-30 22:53     ` H. Peter Anvin
  2013-10-31 11:12     ` Josh Triplett
  2013-10-31 20:02   ` Alexander van Heukelum
  1 sibling, 2 replies; 17+ messages in thread
From: Kees Cook @ 2013-10-30 22:22 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Alexander van Heukelum, Jeremy Fitzhardinge, x86@kernel.org,
	Len Brown, Frederic Weisbecker, H. Peter Anvin,
	virtualization@lists.linux-foundation.org, Paul Gortmaker,
	Raghavendra K T, David Herrmann, Masami Hiramatsu, Seiji Aguchi,
	Jiri Slaby, Alok Kataria, Jesper Nilsson, Andi Kleen,
	Daniel Lezcano, Ingo Molnar, Steven Rostedt, xen-devel,
	Borislav Petkov, Fenghua Yu <feng>

On Mon, Oct 21, 2013 at 7:34 PM, Josh Triplett <josh@joshtriplett.org> wrote:
> __set_tss_desc has a complex calculation of the TSS segment limit,
> duplicating the quirky details of the I/O bitmap array length, and
> requiring a complex comment to explain.  Replace that calculation with a
> simpler one based on the offsetof the "stack" field that follows the
> array.
>
> That then removes the last use of IO_BITMAP_OFFSET, so delete it.
>
> Signed-off-by: Josh Triplett <josh@joshtriplett.org>
> ---
>  arch/x86/include/asm/desc.h      | 11 +----------
>  arch/x86/include/asm/processor.h |  3 ++-
>  2 files changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
> index b90e5df..17ac92f 100644
> --- a/arch/x86/include/asm/desc.h
> +++ b/arch/x86/include/asm/desc.h
> @@ -177,16 +177,7 @@ static inline void __set_tss_desc(unsigned cpu, unsigned int entry, void *addr)
>         struct desc_struct *d = get_cpu_gdt_table(cpu);
>         tss_desc tss;
>
> -       /*
> -        * sizeof(unsigned long) coming from an extra "long" at the end
> -        * of the iobitmap. See tss_struct definition in processor.h
> -        *
> -        * -1? seg base+limit should be pointing to the address of the
> -        * last valid byte

I think it might be better to keep at least a minimal comment near the
TSS_LIMIT declaration, just to explain the "-1" part, which is not
entirely obvious from just reading the code.

-Kees

> -        */
> -       set_tssldt_descriptor(&tss, (unsigned long)addr, DESC_TSS,
> -                             IO_BITMAP_OFFSET + IO_BITMAP_BYTES +
> -                             sizeof(unsigned long) - 1);
> +       set_tssldt_descriptor(&tss, (unsigned long)addr, DESC_TSS, TSS_LIMIT);
>         write_gdt_entry(d, entry, &tss, DESC_TSS);
>  }
>
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 987c75e..03d3003 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -259,9 +259,10 @@ struct x86_hw_tss {
>  #define IO_BITMAP_BITS                 65536
>  #define IO_BITMAP_BYTES                        (IO_BITMAP_BITS/8)
>  #define IO_BITMAP_LONGS                        (IO_BITMAP_BYTES/sizeof(long))
> -#define IO_BITMAP_OFFSET               offsetof(struct tss_struct, io_bitmap)
>  #define INVALID_IO_BITMAP_OFFSET       0x8000
>
> +#define TSS_LIMIT      (offsetof(struct tss_struct, stack) - 1)
> +
>  struct tss_struct {
>         /*
>          * The hardware state:
> --
> 1.8.4.rc3
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] x86: tss: Eliminate fragile calculation of TSS segment limit
  2013-10-30 22:22   ` Kees Cook
@ 2013-10-30 22:53     ` H. Peter Anvin
  2013-10-31 11:17       ` Josh Triplett
  2013-10-31 11:12     ` Josh Triplett
  1 sibling, 1 reply; 17+ messages in thread
From: H. Peter Anvin @ 2013-10-30 22:53 UTC (permalink / raw)
  To: Kees Cook, Josh Triplett
  Cc: Alexander van Heukelum, Jeremy Fitzhardinge, x86@kernel.org,
	Len Brown, Frederic Weisbecker,
	virtualization@lists.linux-foundation.org, Paul Gortmaker,
	Raghavendra K T, David Herrmann, Masami Hiramatsu, Seiji Aguchi,
	Jiri Slaby, Alok Kataria, Jesper Nilsson, Andi Kleen,
	Daniel Lezcano, Ingo Molnar, Steven Rostedt, xen-devel,
	Borislav Petkov, Fenghua Yu, Konrad

On 10/30/2013 03:22 PM, Kees Cook wrote:
>>
>> -       /*
>> -        * sizeof(unsigned long) coming from an extra "long" at the end
>> -        * of the iobitmap. See tss_struct definition in processor.h
>> -        *
>> -        * -1? seg base+limit should be pointing to the address of the
>> -        * last valid byte
> 
> I think it might be better to keep at least a minimal comment near the
> TSS_LIMIT declaration, just to explain the "-1" part, which is not
> entirely obvious from just reading the code.
> 

Agreed, although it doesn't need to be an unsigned long at all... the
CPU will only ever access one extra byte past the end.

	-hpa

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] x86: tss: Eliminate fragile calculation of TSS segment limit
  2013-10-30 22:22   ` Kees Cook
  2013-10-30 22:53     ` H. Peter Anvin
@ 2013-10-31 11:12     ` Josh Triplett
  1 sibling, 0 replies; 17+ messages in thread
From: Josh Triplett @ 2013-10-31 11:12 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexander van Heukelum, Jeremy Fitzhardinge, x86@kernel.org,
	Len Brown, Frederic Weisbecker, H. Peter Anvin,
	virtualization@lists.linux-foundation.org, Paul Gortmaker,
	Raghavendra K T, David Herrmann, Masami Hiramatsu, Seiji Aguchi,
	Jiri Slaby, Alok Kataria, Jesper Nilsson, Andi Kleen,
	Daniel Lezcano, Ingo Molnar, Steven Rostedt, xen-devel,
	Borislav Petkov, Fenghua Yu <feng>

On Wed, Oct 30, 2013 at 03:22:33PM -0700, Kees Cook wrote:
> On Mon, Oct 21, 2013 at 7:34 PM, Josh Triplett <josh@joshtriplett.org> wrote:
> > __set_tss_desc has a complex calculation of the TSS segment limit,
> > duplicating the quirky details of the I/O bitmap array length, and
> > requiring a complex comment to explain.  Replace that calculation with a
> > simpler one based on the offsetof the "stack" field that follows the
> > array.
> >
> > That then removes the last use of IO_BITMAP_OFFSET, so delete it.
> >
> > Signed-off-by: Josh Triplett <josh@joshtriplett.org>
> > ---
> >  arch/x86/include/asm/desc.h      | 11 +----------
> >  arch/x86/include/asm/processor.h |  3 ++-
> >  2 files changed, 3 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
> > index b90e5df..17ac92f 100644
> > --- a/arch/x86/include/asm/desc.h
> > +++ b/arch/x86/include/asm/desc.h
> > @@ -177,16 +177,7 @@ static inline void __set_tss_desc(unsigned cpu, unsigned int entry, void *addr)
> >         struct desc_struct *d = get_cpu_gdt_table(cpu);
> >         tss_desc tss;
> >
> > -       /*
> > -        * sizeof(unsigned long) coming from an extra "long" at the end
> > -        * of the iobitmap. See tss_struct definition in processor.h
> > -        *
> > -        * -1? seg base+limit should be pointing to the address of the
> > -        * last valid byte
> 
> I think it might be better to keep at least a minimal comment near the
> TSS_LIMIT declaration, just to explain the "-1" part, which is not
> entirely obvious from just reading the code.

Fair enough; I've added an appropriate comment next to TSS_LIMIT, and
I'll include that in PATCHv2, which I'll send out as soon as I see
any feedback on patch 3/3.

- Josh Triplett

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] x86: tss: Eliminate fragile calculation of TSS segment limit
  2013-10-30 22:53     ` H. Peter Anvin
@ 2013-10-31 11:17       ` Josh Triplett
  0 siblings, 0 replies; 17+ messages in thread
From: Josh Triplett @ 2013-10-31 11:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Alexander van Heukelum, Jeremy Fitzhardinge, Daniel Lezcano,
	Len Brown, Frederic Weisbecker,
	virtualization@lists.linux-foundation.org, Paul Gortmaker,
	Raghavendra K T, David Herrmann, Masami Hiramatsu, Seiji Aguchi,
	Jiri Slaby, Alok Kataria, Jesper Nilsson, Andi Kleen,
	x86@kernel.org, Ingo Molnar, Steven Rostedt, xen-devel,
	Borislav Petkov, Fenghua Yu, Kees Cook <kees>

On Wed, Oct 30, 2013 at 03:53:11PM -0700, H. Peter Anvin wrote:
> On 10/30/2013 03:22 PM, Kees Cook wrote:
> >>
> >> -       /*
> >> -        * sizeof(unsigned long) coming from an extra "long" at the end
> >> -        * of the iobitmap. See tss_struct definition in processor.h
> >> -        *
> >> -        * -1? seg base+limit should be pointing to the address of the
> >> -        * last valid byte
> > 
> > I think it might be better to keep at least a minimal comment near the
> > TSS_LIMIT declaration, just to explain the "-1" part, which is not
> > entirely obvious from just reading the code.
> > 
> 
> Agreed, although it doesn't need to be an unsigned long at all... the
> CPU will only ever access one extra byte past the end.

True, but the thing immediately following the iobitmap is a stack, which
needs aligning, so the array does need to contain a full additional
unsigned long even if the CPU only accesses a byte of it.  In any case,
that isn't the reason for the -1, just the reason for the
sizeof(unsigned long) mentioned in the comment above, which goes away
now that TSS_LIMIT uses the offset of the *following* field rather than
recalculating the size of the iobitmap.

- Josh Triplett

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] x86: process: Unify 32-bit and 64-bit copy_thread I/O bitmap handling
  2013-10-22  2:34 ` [PATCH 1/3] x86: process: Unify 32-bit and 64-bit copy_thread I/O bitmap handling Josh Triplett
  2013-10-30 22:21   ` Kees Cook
@ 2013-10-31 20:01   ` Alexander van Heukelum
  2013-11-01 16:33     ` Josh Triplett
  1 sibling, 1 reply; 17+ messages in thread
From: Alexander van Heukelum @ 2013-10-31 20:01 UTC (permalink / raw)
  To: Josh Triplett, linux-kernel, virtualization, x86, xen-devel
  Cc: Jeremy Fitzhardinge, Raghavendra K T, Len Brown,
	Frederic Weisbecker, H. Peter Anvin, Paul Gortmaker,
	David Herrmann, Masami Hiramatsu, Seiji Aguchi, Jiri Slaby,
	Alok Kataria, Jesper Nilsson, Andi Kleen, Daniel Lezcano,
	Ingo Molnar, Steven Rostedt, Borislav Petkov, Fenghua Yu,
	Kees Cook, Konrad Rzeszutek Wilk, Ross Lagerwall, Chris Metcalf,
	Chris Wright

Hi Josh,

Since you added me to the CC, I had a look... Comments inline

On Tue, Oct 22, 2013, at 3:34, Josh Triplett wrote:
> The 32-bit and 64-bit versions of copy_thread have functionally
> identical handling for copying the I/O bitmap, modulo differences in
> error handling.  Clean up the error paths in both by moving the copy of
> the I/O bitmap to the end, to eliminate the need to free it if
> subsequent copy steps fail; move the resulting identical code to a
> static inline in a common header.
> 
> Signed-off-by: Josh Triplett <josh@joshtriplett.org>
> ---
>  arch/x86/kernel/process-io.h | 22 ++++++++++++++++++++++

I don't particularly like this new file. It's also not in a proper place.

>  arch/x86/kernel/process_32.c | 29 ++++++++---------------------
>  arch/x86/kernel/process_64.c | 26 +++++---------------------

Yay, this I like :). However, unification is best done in a separate step. First
make the two parts equal in one or two patches. Then, in a separate patch,
do the mechanical unification.

>  3 files changed, 35 insertions(+), 42 deletions(-)
>  create mode 100644 arch/x86/kernel/process-io.h
> 
> diff --git a/arch/x86/kernel/process-io.h b/arch/x86/kernel/process-io.h
> new file mode 100644
> index 0000000..d884444
> --- /dev/null
> +++ b/arch/x86/kernel/process-io.h
> @@ -0,0 +1,22 @@
> +#ifndef _X86_KERNEL_PROCESS_IO_H
> +#define _X86_KERNEL_PROCESS_IO_H
> +
> +static inline int copy_io_bitmap(struct task_struct *me,
> +				 struct task_struct *p)
> +{
> +	if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
> +		p->thread.io_bitmap_ptr = kmemdup(me->thread.io_bitmap_ptr,
> +						  IO_BITMAP_BYTES, GFP_KERNEL);
> +		if (!p->thread.io_bitmap_ptr) {
> +			p->thread.io_bitmap_max = 0;
> +			return -ENOMEM;
> +		}
> +		set_tsk_thread_flag(p, TIF_IO_BITMAP);
> +	} else {
> +		p->thread.io_bitmap_ptr = NULL;
> +	}
> +
> +	return 0;
> +}
> +

I really don't like the .h-file like this. Could you try avoiding it? I think
it's possible to unify the copy_thread function and in the end move it
to process.c instead. The arch-specific parts can be split out in static
functions guarded by CONFIG_X86_64 (cooking up good names is
the challenge here ;) ).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] x86: tss: Eliminate fragile calculation of TSS segment limit
  2013-10-22  2:34 ` [PATCH 2/3] x86: tss: Eliminate fragile calculation of TSS segment limit Josh Triplett
  2013-10-30 22:22   ` Kees Cook
@ 2013-10-31 20:02   ` Alexander van Heukelum
  2013-11-01 16:40     ` Josh Triplett
  1 sibling, 1 reply; 17+ messages in thread
From: Alexander van Heukelum @ 2013-10-31 20:02 UTC (permalink / raw)
  To: Josh Triplett, linux-kernel, virtualization, x86, xen-devel
  Cc: Jeremy Fitzhardinge, Raghavendra K T, Len Brown,
	Frederic Weisbecker, H. Peter Anvin, Paul Gortmaker,
	David Herrmann, Masami Hiramatsu, Seiji Aguchi, Jiri Slaby,
	Alok Kataria, Jesper Nilsson, Andi Kleen, Daniel Lezcano,
	Ingo Molnar, Steven Rostedt, Borislav Petkov, Fenghua Yu,
	Kees Cook, Konrad Rzeszutek Wilk, Ross Lagerwall, Chris Metcalf,
	Chris Wright

Hi Josh,

On Tue, Oct 22, 2013, at 3:34, Josh Triplett wrote:
> __set_tss_desc has a complex calculation of the TSS segment limit,
> duplicating the quirky details of the I/O bitmap array length, and
> requiring a complex comment to explain.  Replace that calculation with a
> simpler one based on the offsetof the "stack" field that follows the
> array.
> 
> That then removes the last use of IO_BITMAP_OFFSET, so delete it.
> 
> Signed-off-by: Josh Triplett <josh@joshtriplett.org>
> ---
>  arch/x86/include/asm/desc.h      | 11 +----------
>  arch/x86/include/asm/processor.h |  3 ++-
>  2 files changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
> index b90e5df..17ac92f 100644
> --- a/arch/x86/include/asm/desc.h
> +++ b/arch/x86/include/asm/desc.h
> @@ -177,16 +177,7 @@ static inline void __set_tss_desc(unsigned cpu, unsigned int entry, void *addr)
>  	struct desc_struct *d = get_cpu_gdt_table(cpu);
>  	tss_desc tss;
>  
> -	/*
> -	 * sizeof(unsigned long) coming from an extra "long" at the end
> -	 * of the iobitmap. See tss_struct definition in processor.h
> -	 *
> -	 * -1? seg base+limit should be pointing to the address of the
> -	 * last valid byte
> -	 */
> -	set_tssldt_descriptor(&tss, (unsigned long)addr, DESC_TSS,
> -			      IO_BITMAP_OFFSET + IO_BITMAP_BYTES +
> -			      sizeof(unsigned long) - 1);
> +	set_tssldt_descriptor(&tss, (unsigned long)addr, DESC_TSS, TSS_LIMIT);
>  	write_gdt_entry(d, entry, &tss, DESC_TSS);
>  }
>  
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 987c75e..03d3003 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -259,9 +259,10 @@ struct x86_hw_tss {
>  #define IO_BITMAP_BITS			65536
>  #define IO_BITMAP_BYTES			(IO_BITMAP_BITS/8)
>  #define IO_BITMAP_LONGS			(IO_BITMAP_BYTES/sizeof(long))
> -#define IO_BITMAP_OFFSET		offsetof(struct tss_struct, io_bitmap)
>  #define INVALID_IO_BITMAP_OFFSET	0x8000
>  
> +#define TSS_LIMIT	(offsetof(struct tss_struct, stack) - 1)
> +

I wonder if the 'stack' in tss_struct has any meaning anymore. Can it be removed?
If so, the offsetof can be changed to sizeof(struct tss_struct), which looks more
natural to me.

Even so, I think this is already an improvement.

Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>

>  struct tss_struct {
>  	/*
>  	 * The hardware state:
> -- 
> 1.8.4.rc3
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] x86: Support compiling out userspace I/O (iopl and ioperm)
  2013-10-22  2:35 ` [PATCH 3/3] x86: Support compiling out userspace I/O (iopl and ioperm) Josh Triplett
  2013-10-26  3:17   ` Stephen Hemminger
@ 2013-10-31 20:04   ` Alexander van Heukelum
  2013-11-01 17:19     ` Josh Triplett
  1 sibling, 1 reply; 17+ messages in thread
From: Alexander van Heukelum @ 2013-10-31 20:04 UTC (permalink / raw)
  To: Josh Triplett, linux-kernel, virtualization, x86, xen-devel
  Cc: Jeremy Fitzhardinge, Raghavendra K T, Len Brown,
	Frederic Weisbecker, H. Peter Anvin, Paul Gortmaker,
	David Herrmann, Masami Hiramatsu, Seiji Aguchi, Jiri Slaby,
	Alok Kataria, Jesper Nilsson, Andi Kleen, Daniel Lezcano,
	Ingo Molnar, Steven Rostedt, Borislav Petkov, Fenghua Yu,
	Kees Cook, Konrad Rzeszutek Wilk, Ross Lagerwall, Chris Metcalf,
	Chris Wright

Hi Josh,

On Tue, Oct 22, 2013, at 3:35, Josh Triplett wrote:
> On the vast majority of modern systems, no processes will use the
> userspsace I/O syscalls, iopl and ioperm.  Add a new config option,
> CONFIG_X86_IOPORT, to support configuring them out of the kernel
> entirely.  Since these syscalls only exist to support rare legacy
> userspace programs, X86_IOPORT does not depend on EXPERT, though it does
> still default to y.
> 
> In addition to saving a significant amount of space, this also reduces
> the size of several major kernel data structures, drops a bit of code
> from several task-related hot paths, and reduces the attack surface of
> the kernel.
> 
> All of the task-related code dealing with userspace I/O permissions now
> lives in process-io.h as several new inline functions, which become
> no-ops when CONFIG_X86_IOPORT.
> 
> bloat-o-meter shows a net reduction of 17681 bytes on 32-bit and 9719
> bytes on 64-bit:
> 
> 32-bit bloat-o-meter:
> add/remove: 0/3 grow/shrink: 0/10 up/down: 0/-17681 (-17681)
> function                                     old     new   delta
> cpu_init                                     676     668      -8
> ioperm_active                                 18       7     -11
> init_task                                   1296    1284     -12
> exit_thread                                  179      91     -88
> ioperm_get                                   103      10     -93
> __switch_to_xtra                             254     161     -93
> sys_iopl                                     127       -    -127
> SyS_iopl                                     127       -    -127

These are not really different ;)

> copy_thread                                  606     446    -160
> vt_ioctl                                    4127    3919    -208
> sys_ioperm                                   370       -    -370
> init_tss                                    8576     384   -8192
> doublefault_tss                             8576     384   -8192
> 
> 64-bit bloat-o-meter:
> add/remove: 0/4 grow/shrink: 2/9 up/down: 45/-9764 (-9719)
> function                                     old     new   delta
> cpu_init                                     958     995     +37
> arch_align_stack                              78      86      +8
> perf_event_exit_task                         525     517      -8
> ioperm_active                                 17       8      -9
> init_task                                   1968    1944     -24
> stub_iopl                                     81       -     -81
> ioperm_get                                   111      11    -100
> __switch_to_xtra                             281     164    -117
> exit_thread                                  212      92    -120
> vt_ioctl                                    4432    4304    -128
> sys_iopl                                     137       -    -137
> SyS_iopl                                     137       -    -137
> copy_thread                                  694     520    -174
> sys_ioperm                                   473       -    -473
> init_tss                                    8896     640   -8256
> 
> Signed-off-by: Josh Triplett <josh@joshtriplett.org>
> ---
>  arch/x86/Kconfig                      | 10 +++++
>  arch/x86/include/asm/paravirt.h       |  2 +
>  arch/x86/include/asm/paravirt_types.h |  2 +
>  arch/x86/include/asm/processor.h      | 51 +++++++++++++++++++++----
>  arch/x86/include/asm/syscalls.h       |  3 ++
>  arch/x86/kernel/Makefile              |  3 +-
>  arch/x86/kernel/cpu/common.c          | 12 +-----
>  arch/x86/kernel/entry_64.S            |  9 +++--
>  arch/x86/kernel/paravirt.c            |  2 +
>  arch/x86/kernel/process-io.h          | 71 +++++++++++++++++++++++++++++++++++
>  arch/x86/kernel/process.c             | 34 ++---------------
>  arch/x86/kernel/process_32.c          | 11 +-----
>  arch/x86/kernel/ptrace.c              |  8 ++++
>  arch/x86/xen/enlighten.c              |  4 ++
>  drivers/tty/vt/vt_ioctl.c             |  2 +-
>  kernel/sys_ni.c                       |  5 +++
>  16 files changed, 168 insertions(+), 61 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index e241a19..d5b1e68 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -976,6 +976,16 @@ config VM86
>  	  XFree86 to initialize some video cards via BIOS. Disabling this
>  	  option saves about 6k.
>  
> +config X86_IOPORT
> +	bool "iopl and ioperm system calls"
> +	default y
> +	---help---
> +	  This option enables the iopl and ioperm system calls, which allow
> +	  privileged userspace processes to directly access I/O ports. This
> +	  is used by some legacy software to drive hardware directly from
> +	  userspace rather than via a proper kernel driver. Unless you intend
> +	  to run such software, you can safely say N here.
> +

I think this entry should be under General setup / Configure standard kernel
features (expert users). Remove references to "legacy" and "proper driver".

>  config TOSHIBA
>  	tristate "Toshiba Laptop support"
>  	depends on X86_32
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 401f350..2e106d5 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -299,10 +299,12 @@ static inline void write_idt_entry(gate_desc *dt, int entry, const gate_desc *g)
>  {
>  	PVOP_VCALL3(pv_cpu_ops.write_idt_entry, dt, entry, g);
>  }
> +#ifdef CONFIG_X86_IOPORT
>  static inline void set_iopl_mask(unsigned mask)
>  {
>  	PVOP_VCALL1(pv_cpu_ops.set_iopl_mask, mask);
>  }
> +#endif /* CONFIG_X86_IOPORT */
>  
>  /* The paravirtualized I/O functions */
>  static inline void slow_down_io(void)
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index aab8f67..bc5bfeb 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -142,7 +142,9 @@ struct pv_cpu_ops {
>  
>  	void (*load_sp0)(struct tss_struct *tss, struct thread_struct *t);
>  
> +#ifdef CONFIG_X86_IOPORT
>  	void (*set_iopl_mask)(unsigned mask);
> +#endif /* CONFIG_X86_IOPORT */
>  
>  	void (*wbinvd)(void);
>  	void (*io_delay)(void);
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 03d3003..48f6fbe 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -256,7 +256,11 @@ struct x86_hw_tss {
>  /*
>   * IO-bitmap sizes:
>   */
> +#ifdef CONFIG_X86_IOPORT
>  #define IO_BITMAP_BITS			65536
> +#else
> +#define IO_BITMAP_BITS			0
> +#endif
>  #define IO_BITMAP_BYTES			(IO_BITMAP_BITS/8)
>  #define IO_BITMAP_LONGS			(IO_BITMAP_BYTES/sizeof(long))
>  #define INVALID_IO_BITMAP_OFFSET	0x8000
> @@ -269,6 +273,7 @@ struct tss_struct {
>  	 */
>  	struct x86_hw_tss	x86_tss;
>  
> +#ifdef CONFIG_X86_IOPORT
>  	/*
>  	 * The extra 1 is there because the CPU will access an
>  	 * additional byte beyond the end of the IO permission
> @@ -276,6 +281,7 @@ struct tss_struct {
>  	 * be within the limit.
>  	 */
>  	unsigned long		io_bitmap[IO_BITMAP_LONGS + 1];
> +#endif /* CONFIG_X86_IOPORT */
>  
>  	/*
>  	 * .. and then another 0x100 bytes for the emergency kernel stack:
> @@ -286,6 +292,24 @@ struct tss_struct {
>  
>  DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, init_tss);
>  
> +static inline void init_tss_io(struct tss_struct *t)
> +{
> +#ifdef CONFIG_X86_IOPORT
> +	int i;
> +
> +	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
> +
> +	/*
> +	 * <= is required because the CPU will access up to
> +	 * 8 bits beyond the end of the IO permission bitmap.
> +	 */
> +	for (i = 0; i <= IO_BITMAP_LONGS; i++)
> +		t->io_bitmap[i] = ~0UL;
> +#else
> +	t->x86_tss.io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
> +#endif
> +}
> +
>  /*
>   * Save the original ist values for checking stack pointers during debugging
>   */
> @@ -484,13 +508,16 @@ struct thread_struct {
>  	unsigned int		saved_fs;
>  	unsigned int		saved_gs;
>  #endif
> +#ifdef CONFIG_X86_IOPORT
>  	/* IO permissions: */
>  	unsigned long		*io_bitmap_ptr;
>  	unsigned long		iopl;
>  	/* Max allowed port in the bitmap, in bytes: */
>  	unsigned		io_bitmap_max;
> +#endif /* CONFIG_X86_IOPORT */
>  };
>  
> +#ifdef CONFIG_X86_IOPORT
>  /*
>   * Set IOPL bits in EFLAGS from given mask
>   */
> @@ -509,6 +536,7 @@ static inline void native_set_iopl_mask(unsigned mask)
>  		      : "i" (~X86_EFLAGS_IOPL), "r" (mask));
>  #endif
>  }
> +#endif /* CONFIG_X86_IOPORT */
>  
>  static inline void
>  native_load_sp0(struct tss_struct *tss, struct thread_struct *thread)
> @@ -828,12 +856,8 @@ static inline void spin_lock_prefetch(const void *x)
>  #define STACK_TOP		TASK_SIZE
>  #define STACK_TOP_MAX		STACK_TOP
>  
> -#define INIT_THREAD  {							  \
> -	.sp0			= sizeof(init_stack) + (long)&init_stack, \
> -	.vm86_info		= NULL,					  \
> -	.sysenter_cs		= __KERNEL_CS,				  \
> -	.io_bitmap_ptr		= NULL,					  \
> -}
> +#ifdef CONFIG_X86_IOPORT
> +#define INIT_THREAD_IO .io_bitmap_ptr = NULL,
>  
>  /*
>   * Note that the .io_bitmap member must be extra-big. This is because
> @@ -841,6 +865,19 @@ static inline void spin_lock_prefetch(const void *x)
>   * permission bitmap. The extra byte must be all 1 bits, and must
>   * be within the limit.
>   */
> +#define INIT_TSS_IO .io_bitmap = { [0 ... IO_BITMAP_LONGS] = ~0 },
> +#else
> +#define INIT_THREAD_IO
> +#define INIT_TSS_IO
> +#endif
> +
> +#define INIT_THREAD  {							  \
> +	.sp0			= sizeof(init_stack) + (long)&init_stack, \
> +	.vm86_info		= NULL,					  \
> +	.sysenter_cs		= __KERNEL_CS,				  \
> +	INIT_THREAD_IO							  \
> +}
> +
>  #define INIT_TSS  {							  \
>  	.x86_tss = {							  \
>  		.sp0		= sizeof(init_stack) + (long)&init_stack, \
> @@ -848,7 +885,7 @@ static inline void spin_lock_prefetch(const void *x)
>  		.ss1		= __KERNEL_CS,				  \
>  		.io_bitmap_base	= INVALID_IO_BITMAP_OFFSET,		  \
>  	 },								  \
> -	.io_bitmap		= { [0 ... IO_BITMAP_LONGS] = ~0 },	  \
> +	INIT_TSS_IO							  \
>  }
>  
>  extern unsigned long thread_saved_pc(struct task_struct *tsk);
> diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h
> index 592a6a6..4db79ea 100644
> --- a/arch/x86/include/asm/syscalls.h
> +++ b/arch/x86/include/asm/syscalls.h
> @@ -16,9 +16,12 @@
>  #include <linux/types.h>
>  
>  /* Common in X86_32 and X86_64 */
> +
> +#ifdef CONFIG_X86_IOPORT
>  /* kernel/ioport.c */
>  asmlinkage long sys_ioperm(unsigned long, unsigned long, int);
>  asmlinkage long sys_iopl(unsigned int);
> +#endif /* CONFIG_X86_IOPORT */
>  
>  /* kernel/ldt.c */
>  asmlinkage int sys_modify_ldt(int, void __user *, unsigned long);
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index a5408b9..9644737 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -20,7 +20,8 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace
>  
>  obj-y			:= process_$(BITS).o signal.o entry_$(BITS).o
>  obj-y			+= traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
> -obj-y			+= time.o ioport.o ldt.o dumpstack.o nmi.o
> +obj-y			+= time.o ldt.o dumpstack.o nmi.o
> +obj-$(CONFIG_X86_IOPORT) += ioport.o
>  obj-y			+= setup.o x86_init.o i8259.o irqinit.o jump_label.o
>  obj-$(CONFIG_IRQ_WORK)  += irq_work.o
>  obj-y			+= probe_roms.o
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 2793d1f..01235ef 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1223,7 +1223,6 @@ void cpu_init(void)
>  	struct tss_struct *t;
>  	unsigned long v;
>  	int cpu;
> -	int i;
>  
>  	/*
>  	 * Load microcode on this cpu if a valid microcode is available.
> @@ -1285,14 +1284,7 @@ void cpu_init(void)
>  		}
>  	}
>  
> -	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
> -
> -	/*
> -	 * <= is required because the CPU will access up to
> -	 * 8 bits beyond the end of the IO permission bitmap.
> -	 */
> -	for (i = 0; i <= IO_BITMAP_LONGS; i++)
> -		t->io_bitmap[i] = ~0UL;
> +	init_tss_io(t);
>  
>  	atomic_inc(&init_mm.mm_count);
>  	me->active_mm = &init_mm;
> @@ -1351,7 +1343,7 @@ void cpu_init(void)
>  	load_TR_desc();
>  	load_LDT(&init_mm.context);
>  
> -	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
> +	init_tss_io(t);

This patch is too big. I think it would all look nicer if you added ioport.c in
one patch, and then convert the users in a separate patch?

>  #ifdef CONFIG_DOUBLEFAULT
>  	/* Set up doublefault TSS pointer in the GDT */
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 1b69951..eb4f5ba 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -844,6 +844,11 @@ ENTRY(stub_\func)
>  END(stub_\func)
>  	.endm
>  
> +	FORK_LIKE  clone
> +	FORK_LIKE  fork
> +	FORK_LIKE  vfork
> +
> +#ifdef CONFIG_X86_IOPORT
>  	.macro FIXED_FRAME label,func
>  ENTRY(\label)
>  	CFI_STARTPROC
> @@ -856,10 +861,8 @@ ENTRY(\label)
>  END(\label)
>  	.endm
>  
> -	FORK_LIKE  clone
> -	FORK_LIKE  fork
> -	FORK_LIKE  vfork
>  	FIXED_FRAME stub_iopl, sys_iopl
> +#endif /* CONFIG_X86_IOPORT */
>  
>  ENTRY(ptregscall_common)
>  	DEFAULT_FRAME 1 8	/* offset 8: return address */
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 1b10af8..46da3d9 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -382,7 +382,9 @@ __visible struct pv_cpu_ops pv_cpu_ops = {
>  	.iret = native_iret,
>  	.swapgs = native_swapgs,
>  
> +#ifdef CONFIG_X86_IOPORT
>  	.set_iopl_mask = native_set_iopl_mask,
> +#endif /* CONFIG_X86_IOPORT */
>  	.io_delay = native_io_delay,
>  
>  	.start_context_switch = paravirt_nop,
> diff --git a/arch/x86/kernel/process-io.h b/arch/x86/kernel/process-io.h
> index d884444..3e773fa 100644
> --- a/arch/x86/kernel/process-io.h
> +++ b/arch/x86/kernel/process-io.h
> @@ -1,9 +1,17 @@
>  #ifndef _X86_KERNEL_PROCESS_IO_H
>  #define _X86_KERNEL_PROCESS_IO_H
>  
> +static inline void clear_thread_io_bitmap(struct task_struct *p)
> +{
> +#ifdef CONFIG_X86_IOPORT
> +	p->thread.io_bitmap_ptr = NULL;
> +#endif /* CONFIG_X86_IOPORT */
> +}
> +

This is thought of as ugly... Instead, do something like

#ifndef CONFIG_X86_IOPORT

static inline void clear_thread_io_bitmap(struct task_struct *p) {}
static inline int copy_io_bitmap(struct task_struct *me, struct task_struct *p) {return 0}
... etc...

#else

static inline void clear_thread_io_bitmap(struct task_struct *p)
{
   p->thread.io_bitmap_ptr = NULL;
}
... etc...

#endif


>  static inline int copy_io_bitmap(struct task_struct *me,
>  				 struct task_struct *p)
>  {
> +#ifdef CONFIG_X86_IOPORT
>  	if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
>  		p->thread.io_bitmap_ptr = kmemdup(me->thread.io_bitmap_ptr,
>  						  IO_BITMAP_BYTES, GFP_KERNEL);
> @@ -15,8 +23,71 @@ static inline int copy_io_bitmap(struct task_struct *me,
>  	} else {
>  		p->thread.io_bitmap_ptr = NULL;
>  	}
> +#endif /* CONFIG_X86_IOPORT */
>  
>  	return 0;
>  }
>  
> +static inline void exit_thread_io(struct task_struct *me)
> +{
> +#ifdef CONFIG_X86_IOPORT
> +        struct thread_struct *t = &me->thread;
> +        unsigned long *bp = t->io_bitmap_ptr;
> +
> +        if (bp) {
> +                struct tss_struct *tss = &per_cpu(init_tss, get_cpu());
> +
> +                t->io_bitmap_ptr = NULL;
> +                clear_thread_flag(TIF_IO_BITMAP);
> +                /*
> +                 * Careful, clear this in the TSS too:
> +                 */
> +                memset(tss->io_bitmap, 0xff, t->io_bitmap_max);
> +                t->io_bitmap_max = 0;
> +                put_cpu();
> +                kfree(bp);
> +        }
> +#endif /* CONFIG_X86_IOPORT */
> +}
> +
> +static inline void switch_iopl_mask(struct thread_struct *prev,
> +				    struct thread_struct *next)
> +{
> +#ifdef CONFIG_X86_IOPORT
> +        /*
> +         * Restore IOPL if needed.  In normal use, the flags restore
> +         * in the switch assembly will handle this.  But if the kernel
> +         * is running virtualized at a non-zero CPL, the popf will
> +         * not restore flags, so it must be done in a separate step.
> +         */
> +        if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))
> +                set_iopl_mask(next->iopl);
> +#endif /* CONFIG_X86_IOPORT */
> +}
> +
> +static inline void switch_io_bitmap(struct tss_struct *tss,
> +				    struct task_struct *prev_p,
> +				    struct task_struct *next_p)
> +{
> +#ifdef CONFIG_X86_IOPORT
> +        struct thread_struct *prev, *next;
> +        prev = &prev_p->thread;
> +        next = &next_p->thread;
> +
> +        if (test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
> +                /*
> +                 * Copy the relevant range of the IO bitmap.
> +                 * Normally this is 128 bytes or less:
> +                 */
> +                memcpy(tss->io_bitmap, next->io_bitmap_ptr,
> +                       max(prev->io_bitmap_max, next->io_bitmap_max));
> +        } else if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)) {
> +                /*
> +                 * Clear any possible leftover bits:
> +                 */
> +                memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
> +        }
> +#endif /* CONFIG_X86_IOPORT */
> +}
> +
>  #endif /* _X86_KERNEL_PROCESS_IO_H */
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index c83516b..2df5b00 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -29,6 +29,8 @@
>  #include <asm/debugreg.h>
>  #include <asm/nmi.h>
>  
> +#include "process-io.h"
> +
>  /*
>   * per-CPU TSS segments. Threads are completely 'soft' on Linux,
>   * no more per-task TSS's. The TSS size is kept cacheline-aligned
> @@ -101,23 +103,7 @@ void arch_task_cache_init(void)
>  void exit_thread(void)
>  {
>  	struct task_struct *me = current;
> -	struct thread_struct *t = &me->thread;
> -	unsigned long *bp = t->io_bitmap_ptr;
> -
> -	if (bp) {
> -		struct tss_struct *tss = &per_cpu(init_tss, get_cpu());
> -
> -		t->io_bitmap_ptr = NULL;
> -		clear_thread_flag(TIF_IO_BITMAP);
> -		/*
> -		 * Careful, clear this in the TSS too:
> -		 */
> -		memset(tss->io_bitmap, 0xff, t->io_bitmap_max);
> -		t->io_bitmap_max = 0;
> -		put_cpu();
> -		kfree(bp);
> -	}
> -
> +	exit_thread_io(me);
>  	drop_fpu(me);
>  }
>  
> @@ -222,19 +208,7 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
>  			hard_enable_TSC();
>  	}
>  
> -	if (test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
> -		/*
> -		 * Copy the relevant range of the IO bitmap.
> -		 * Normally this is 128 bytes or less:
> -		 */
> -		memcpy(tss->io_bitmap, next->io_bitmap_ptr,
> -		       max(prev->io_bitmap_max, next->io_bitmap_max));
> -	} else if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)) {
> -		/*
> -		 * Clear any possible leftover bits:
> -		 */
> -		memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
> -	}
> +	switch_io_bitmap(tss, prev_p, next_p);
>  	propagate_user_return_notify(prev_p, next_p);
>  }
>  
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index e86028b..dc780cb 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -155,7 +155,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
>  		childregs->cs = __KERNEL_CS | get_kernel_rpl();
>  		childregs->flags = X86_EFLAGS_IF | X86_EFLAGS_FIXED;
>  		p->fpu_counter = 0;
> -		p->thread.io_bitmap_ptr = NULL;
> +		clear_thread_io_bitmap(p);
>  		memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
>  		return 0;
>  	}
> @@ -269,14 +269,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>  	 */
>  	load_TLS(next, cpu);
>  
> -	/*
> -	 * Restore IOPL if needed.  In normal use, the flags restore
> -	 * in the switch assembly will handle this.  But if the kernel
> -	 * is running virtualized at a non-zero CPL, the popf will
> -	 * not restore flags, so it must be done in a separate step.
> -	 */
> -	if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))
> -		set_iopl_mask(next->iopl);
> +	switch_iopl_mask(prev, next);
>  
>  	/*
>  	 * Now maybe handle debug registers and/or IO bitmaps

If copy_thread would be in process.c instead, the .h-file would be unnecessary,
right?

> diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
> index 7461f50..f071bfa 100644
> --- a/arch/x86/kernel/ptrace.c
> +++ b/arch/x86/kernel/ptrace.c
> @@ -785,7 +785,11 @@ static int ptrace_set_debugreg(struct task_struct *tsk, int n,
>  static int ioperm_active(struct task_struct *target,
>  			 const struct user_regset *regset)
>  {
> +#ifdef CONFIG_X86_IOPORT
>  	return target->thread.io_bitmap_max / regset->size;
> +#else
> +	return 0;
> +#endif
>  }
>  
>  static int ioperm_get(struct task_struct *target,
> @@ -793,12 +797,16 @@ static int ioperm_get(struct task_struct *target,
>  		      unsigned int pos, unsigned int count,
>  		      void *kbuf, void __user *ubuf)
>  {
> +#ifdef CONFIG_X86_IOPORT
>  	if (!target->thread.io_bitmap_ptr)
>  		return -ENXIO;
>  
>  	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
>  				   target->thread.io_bitmap_ptr,
>  				   0, IO_BITMAP_BYTES);
> +#else
> +	return -ENXIO;
> +#endif
>  }
>  
>  /*
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index fa6ade7..f485d92 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -911,6 +911,7 @@ static void xen_load_sp0(struct tss_struct *tss,
>  	xen_mc_issue(PARAVIRT_LAZY_CPU);
>  }
>  
> +#ifdef CONFIG_X86_IOPORT
>  static void xen_set_iopl_mask(unsigned mask)
>  {
>  	struct physdev_set_iopl set_iopl;
> @@ -919,6 +920,7 @@ static void xen_set_iopl_mask(unsigned mask)
>  	set_iopl.iopl = (mask == 0) ? 1 : (mask >> 12) & 3;
>  	HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
>  }
> +#endif /* CONFIG_X86_IOPORT */
>  
>  static void xen_io_delay(void)
>  {
> @@ -1277,7 +1279,9 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = {
>  	.write_idt_entry = xen_write_idt_entry,
>  	.load_sp0 = xen_load_sp0,
>  
> +#ifdef CONFIG_X86_IOPORT
>  	.set_iopl_mask = xen_set_iopl_mask,
> +#endif /* CONFIG_X86_IOPORT */
>  	.io_delay = xen_io_delay,
>  
>  	/* Xen takes care of %gs when switching to usermode for us */
> diff --git a/drivers/tty/vt/vt_ioctl.c b/drivers/tty/vt/vt_ioctl.c
> index 2bd78e2..7c7d405 100644
> --- a/drivers/tty/vt/vt_ioctl.c
> +++ b/drivers/tty/vt/vt_ioctl.c
> @@ -410,7 +410,7 @@ int vt_ioctl(struct tty_struct *tty,
>  		 *
>  		 * XXX: you should never use these, just call ioperm directly..
>  		 */
> -#ifdef CONFIG_X86
> +#ifdef CONFIG_X86_IOPORT
>  	case KDADDIO:
>  	case KDDELIO:
>  		/*
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 7078052..e7f4a35 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -209,3 +209,8 @@ cond_syscall(compat_sys_open_by_handle_at);
>  
>  /* compare kernel pointers */
>  cond_syscall(sys_kcmp);
> +
> +/* userspace I/O port access */
> +cond_syscall(stub_iopl);
> +cond_syscall(sys_iopl);
> +cond_syscall(sys_ioperm);
> -- 
> 1.8.4.rc3
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] x86: process: Unify 32-bit and 64-bit copy_thread I/O bitmap handling
  2013-10-31 20:01   ` Alexander van Heukelum
@ 2013-11-01 16:33     ` Josh Triplett
  0 siblings, 0 replies; 17+ messages in thread
From: Josh Triplett @ 2013-11-01 16:33 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Jeremy Fitzhardinge, x86, Len Brown, Frederic Weisbecker,
	H. Peter Anvin, virtualization, Paul Gortmaker, Raghavendra K T,
	David Herrmann, Masami Hiramatsu, Seiji Aguchi, Jiri Slaby,
	Alok Kataria, Jesper Nilsson, Andi Kleen, Daniel Lezcano,
	Ingo Molnar, Steven Rostedt, xen-devel, Borislav Petkov,
	Fenghua Yu, Kees Cook, Konrad Rzeszutek Wilk, Ross

On Thu, Oct 31, 2013 at 09:01:30PM +0100, Alexander van Heukelum wrote:
> On Tue, Oct 22, 2013, at 3:34, Josh Triplett wrote:
> > The 32-bit and 64-bit versions of copy_thread have functionally
> > identical handling for copying the I/O bitmap, modulo differences in
> > error handling.  Clean up the error paths in both by moving the copy of
> > the I/O bitmap to the end, to eliminate the need to free it if
> > subsequent copy steps fail; move the resulting identical code to a
> > static inline in a common header.
> > 
> > Signed-off-by: Josh Triplett <josh@joshtriplett.org>
> > ---
> >  arch/x86/kernel/process-io.h | 22 ++++++++++++++++++++++
> 
> I don't particularly like this new file. It's also not in a proper place.

It's a private header, and private headers are normally put in the
directories with the source code, not in the include directories.

The need for it becomes more obvious later in the patch series.

> >  arch/x86/kernel/process_32.c | 29 ++++++++---------------------
> >  arch/x86/kernel/process_64.c | 26 +++++---------------------
> 
> Yay, this I like :). However, unification is best done in a separate step. First
> make the two parts equal in one or two patches. Then, in a separate patch,
> do the mechanical unification.

I could certainly split the first patch into two, sure.

> >  3 files changed, 35 insertions(+), 42 deletions(-)
> >  create mode 100644 arch/x86/kernel/process-io.h
> > 
> > diff --git a/arch/x86/kernel/process-io.h b/arch/x86/kernel/process-io.h
> > new file mode 100644
> > index 0000000..d884444
> > --- /dev/null
> > +++ b/arch/x86/kernel/process-io.h
> > @@ -0,0 +1,22 @@
> > +#ifndef _X86_KERNEL_PROCESS_IO_H
> > +#define _X86_KERNEL_PROCESS_IO_H
> > +
> > +static inline int copy_io_bitmap(struct task_struct *me,
> > +				 struct task_struct *p)
> > +{
> > +	if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
> > +		p->thread.io_bitmap_ptr = kmemdup(me->thread.io_bitmap_ptr,
> > +						  IO_BITMAP_BYTES, GFP_KERNEL);
> > +		if (!p->thread.io_bitmap_ptr) {
> > +			p->thread.io_bitmap_max = 0;
> > +			return -ENOMEM;
> > +		}
> > +		set_tsk_thread_flag(p, TIF_IO_BITMAP);
> > +	} else {
> > +		p->thread.io_bitmap_ptr = NULL;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> 
> I really don't like the .h-file like this. Could you try avoiding it? I think
> it's possible to unify the copy_thread function and in the end move it
> to process.c instead. The arch-specific parts can be split out in static
> functions guarded by CONFIG_X86_64 (cooking up good names is
> the challenge here ;) ).

It's theoretically possible to completely unify copy_thread by adding
several more functions like this to abstract the low-level details;
however, complete unification wasn't the goal of this patch series.  I
unified and pulled out this portion because it's also the portion I need
to *remove* when compiling with !CONFIG_X86_IOPORT.

- Josh Triplett

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] x86: tss: Eliminate fragile calculation of TSS segment limit
  2013-10-31 20:02   ` Alexander van Heukelum
@ 2013-11-01 16:40     ` Josh Triplett
  0 siblings, 0 replies; 17+ messages in thread
From: Josh Triplett @ 2013-11-01 16:40 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Jeremy Fitzhardinge, x86, Len Brown, Frederic Weisbecker,
	H. Peter Anvin, virtualization, Paul Gortmaker, Raghavendra K T,
	David Herrmann, Masami Hiramatsu, Seiji Aguchi, Jiri Slaby,
	Alok Kataria, Jesper Nilsson, Andi Kleen, Daniel Lezcano,
	Ingo Molnar, Steven Rostedt, xen-devel, Borislav Petkov,
	Fenghua Yu, Kees Cook, Konrad Rzeszutek Wilk, Ross

On Thu, Oct 31, 2013 at 09:02:14PM +0100, Alexander van Heukelum wrote:
> Hi Josh,
> 
> On Tue, Oct 22, 2013, at 3:34, Josh Triplett wrote:
> > __set_tss_desc has a complex calculation of the TSS segment limit,
> > duplicating the quirky details of the I/O bitmap array length, and
> > requiring a complex comment to explain.  Replace that calculation with a
> > simpler one based on the offsetof the "stack" field that follows the
> > array.
> > 
> > That then removes the last use of IO_BITMAP_OFFSET, so delete it.
> > 
> > Signed-off-by: Josh Triplett <josh@joshtriplett.org>
> > ---
> >  arch/x86/include/asm/desc.h      | 11 +----------
> >  arch/x86/include/asm/processor.h |  3 ++-
> >  2 files changed, 3 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
> > index b90e5df..17ac92f 100644
> > --- a/arch/x86/include/asm/desc.h
> > +++ b/arch/x86/include/asm/desc.h
> > @@ -177,16 +177,7 @@ static inline void __set_tss_desc(unsigned cpu, unsigned int entry, void *addr)
> >  	struct desc_struct *d = get_cpu_gdt_table(cpu);
> >  	tss_desc tss;
> >  
> > -	/*
> > -	 * sizeof(unsigned long) coming from an extra "long" at the end
> > -	 * of the iobitmap. See tss_struct definition in processor.h
> > -	 *
> > -	 * -1? seg base+limit should be pointing to the address of the
> > -	 * last valid byte
> > -	 */
> > -	set_tssldt_descriptor(&tss, (unsigned long)addr, DESC_TSS,
> > -			      IO_BITMAP_OFFSET + IO_BITMAP_BYTES +
> > -			      sizeof(unsigned long) - 1);
> > +	set_tssldt_descriptor(&tss, (unsigned long)addr, DESC_TSS, TSS_LIMIT);
> >  	write_gdt_entry(d, entry, &tss, DESC_TSS);
> >  }
> >  
> > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> > index 987c75e..03d3003 100644
> > --- a/arch/x86/include/asm/processor.h
> > +++ b/arch/x86/include/asm/processor.h
> > @@ -259,9 +259,10 @@ struct x86_hw_tss {
> >  #define IO_BITMAP_BITS			65536
> >  #define IO_BITMAP_BYTES			(IO_BITMAP_BITS/8)
> >  #define IO_BITMAP_LONGS			(IO_BITMAP_BYTES/sizeof(long))
> > -#define IO_BITMAP_OFFSET		offsetof(struct tss_struct, io_bitmap)
> >  #define INVALID_IO_BITMAP_OFFSET	0x8000
> >  
> > +#define TSS_LIMIT	(offsetof(struct tss_struct, stack) - 1)
> > +
> 
> I wonder if the 'stack' in tss_struct has any meaning anymore. Can it be removed?
> If so, the offsetof can be changed to sizeof(struct tss_struct), which looks more
> natural to me.

That might be possible, if nothing references the stack.  The obvious
references would be easy to search for, but I'd be concerned that
something might reference the stack in some non-obvious way, like "right
after the TSS base+limit", which a search wouldn't necessarily turn up.

In any case, that change could easily occur in another patch on top of
this one.

> Even so, I think this is already an improvement.
> 
> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>

Thanks.

- Josh Triplett

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] x86: Support compiling out userspace I/O (iopl and ioperm)
  2013-10-31 20:04   ` Alexander van Heukelum
@ 2013-11-01 17:19     ` Josh Triplett
  0 siblings, 0 replies; 17+ messages in thread
From: Josh Triplett @ 2013-11-01 17:19 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Jeremy Fitzhardinge, x86, Len Brown, Frederic Weisbecker,
	H. Peter Anvin, virtualization, Paul Gortmaker, Raghavendra K T,
	David Herrmann, Masami Hiramatsu, Seiji Aguchi, Jiri Slaby,
	Alok Kataria, Jesper Nilsson, Andi Kleen, Daniel Lezcano,
	Ingo Molnar, Steven Rostedt, xen-devel, Borislav Petkov,
	Fenghua Yu, Kees Cook, Konrad Rzeszutek Wilk, Ross

On Thu, Oct 31, 2013 at 09:04:42PM +0100, Alexander van Heukelum wrote:
> On Tue, Oct 22, 2013, at 3:35, Josh Triplett wrote:
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -976,6 +976,16 @@ config VM86
> >  	  XFree86 to initialize some video cards via BIOS. Disabling this
> >  	  option saves about 6k.
> >  
> > +config X86_IOPORT
> > +	bool "iopl and ioperm system calls"
> > +	default y
> > +	---help---
> > +	  This option enables the iopl and ioperm system calls, which allow
> > +	  privileged userspace processes to directly access I/O ports. This
> > +	  is used by some legacy software to drive hardware directly from
> > +	  userspace rather than via a proper kernel driver. Unless you intend
> > +	  to run such software, you can safely say N here.
> > +
> 
> I think this entry should be under General setup / Configure standard kernel
> features (expert users).

It's entirely x86-specific, and it's similar to VM86 just above it; it
belongs on the x86-specific menu.

> Remove references to "legacy" and "proper driver".

Hardware drivers belong in the kernel, and anything using iopl or ioperm
won't run on x86-64, which argues rather strongly for its obsolescence.
There's also /dev/port for cleaner access to ports from userspace.
However, I've rephrased this for v2.

> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -1223,7 +1223,6 @@ void cpu_init(void)
> >  	struct tss_struct *t;
> >  	unsigned long v;
> >  	int cpu;
> > -	int i;
> >  
> >  	/*
> >  	 * Load microcode on this cpu if a valid microcode is available.
> > @@ -1285,14 +1284,7 @@ void cpu_init(void)
> >  		}
> >  	}
> >  
> > -	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
> > -
> > -	/*
> > -	 * <= is required because the CPU will access up to
> > -	 * 8 bits beyond the end of the IO permission bitmap.
> > -	 */
> > -	for (i = 0; i <= IO_BITMAP_LONGS; i++)
> > -		t->io_bitmap[i] = ~0UL;
> > +	init_tss_io(t);
> >  
> >  	atomic_inc(&init_mm.mm_count);
> >  	me->active_mm = &init_mm;
> > @@ -1351,7 +1343,7 @@ void cpu_init(void)
> >  	load_TR_desc();
> >  	load_LDT(&init_mm.context);
> >  
> > -	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
> > +	init_tss_io(t);
> 
> This patch is too big. I think it would all look nicer if you added ioport.c in
> one patch, and then convert the users in a separate patch?

I'm not sure what you mean by "added ioport.c"; ioport.c already exists,
and this patch just makes it optional.

> > --- a/arch/x86/kernel/process-io.h
> > +++ b/arch/x86/kernel/process-io.h
> > @@ -1,9 +1,17 @@
> >  #ifndef _X86_KERNEL_PROCESS_IO_H
> >  #define _X86_KERNEL_PROCESS_IO_H
> >  
> > +static inline void clear_thread_io_bitmap(struct task_struct *p)
> > +{
> > +#ifdef CONFIG_X86_IOPORT
> > +	p->thread.io_bitmap_ptr = NULL;
> > +#endif /* CONFIG_X86_IOPORT */
> > +}
> > +
> 
> This is thought of as ugly... Instead, do something like
> 
> #ifndef CONFIG_X86_IOPORT
> 
> static inline void clear_thread_io_bitmap(struct task_struct *p) {}
> static inline int copy_io_bitmap(struct task_struct *me, struct task_struct *p) {return 0}
> ... etc...
> 
> #else
> 
> static inline void clear_thread_io_bitmap(struct task_struct *p)
> {
>    p->thread.io_bitmap_ptr = NULL;
> }
> ... etc...
> 
> #endif

In .c files, any ifdefs at all are ugly; in .h files, both styles are
quite common, and in particular the style I used is common when omitting
the entire body of the function.  It has the advantage of keeping a
common function signature rather than duplicating it.

> > --- a/arch/x86/kernel/process_32.c
> > +++ b/arch/x86/kernel/process_32.c
> > @@ -155,7 +155,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
> >  		childregs->cs = __KERNEL_CS | get_kernel_rpl();
> >  		childregs->flags = X86_EFLAGS_IF | X86_EFLAGS_FIXED;
> >  		p->fpu_counter = 0;
> > -		p->thread.io_bitmap_ptr = NULL;
> > +		clear_thread_io_bitmap(p);
> >  		memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
> >  		return 0;
> >  	}
> > @@ -269,14 +269,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
> >  	 */
> >  	load_TLS(next, cpu);
> >  
> > -	/*
> > -	 * Restore IOPL if needed.  In normal use, the flags restore
> > -	 * in the switch assembly will handle this.  But if the kernel
> > -	 * is running virtualized at a non-zero CPL, the popf will
> > -	 * not restore flags, so it must be done in a separate step.
> > -	 */
> > -	if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))
> > -		set_iopl_mask(next->iopl);
> > +	switch_iopl_mask(prev, next);
> >  
> >  	/*
> >  	 * Now maybe handle debug registers and/or IO bitmaps
> 
> If copy_thread would be in process.c instead, the .h-file would be unnecessary,
> right?

No, there are calls to the new functions in the .h file in several other
places.

- Josh Triplett

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-11-01 17:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-22  2:33 [PATCH 0/3] x86: Support compiling out userspace I/O (iopl and ioperm) Josh Triplett
2013-10-22  2:34 ` [PATCH 1/3] x86: process: Unify 32-bit and 64-bit copy_thread I/O bitmap handling Josh Triplett
2013-10-30 22:21   ` Kees Cook
2013-10-31 20:01   ` Alexander van Heukelum
2013-11-01 16:33     ` Josh Triplett
2013-10-22  2:34 ` [PATCH 2/3] x86: tss: Eliminate fragile calculation of TSS segment limit Josh Triplett
2013-10-30 22:22   ` Kees Cook
2013-10-30 22:53     ` H. Peter Anvin
2013-10-31 11:17       ` Josh Triplett
2013-10-31 11:12     ` Josh Triplett
2013-10-31 20:02   ` Alexander van Heukelum
2013-11-01 16:40     ` Josh Triplett
2013-10-22  2:35 ` [PATCH 3/3] x86: Support compiling out userspace I/O (iopl and ioperm) Josh Triplett
2013-10-26  3:17   ` Stephen Hemminger
2013-10-26  4:30     ` Kees Cook
2013-10-31 20:04   ` Alexander van Heukelum
2013-11-01 17:19     ` Josh Triplett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).