public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch 0/8] Basic infrastructure patches for a paravirtualized kernel
@ 2006-08-03  0:25 Jeremy Fitzhardinge
  2006-08-03  0:25 ` [patch 1/8] Remove locally-defined ldt structure in favour of standard type Jeremy Fitzhardinge
                   ` (7 more replies)
  0 siblings, 8 replies; 43+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-03  0:25 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, virtualization, xen-devel, Jeremy Fitzhardinge

Hi Andrew,

This series of patches lays the basic ground work for the
paravirtualized kernel patches coming later on.  I think this lot is
ready for the rough-and-tumble world of the -mm tree.

The main change from the last posting is that all the page-table
related patches have been moved out, and will be posted separately.

Also, the off-by-one in reserving the top of address space has been
fixed.

For the most part, these patches do nothing or very little.  The
patches should be self explanatory, but the overview is:

Helper functions for later use:
	2/8: Implement always-locked bit ops...
	8/8: Put .note.* sections into a PT_NOTE segment in vmlinux

Cleanups:
	1/8: Remove locally-defined ldt structure in favour of standard type
	3/8: Allow a kernel to not be in ring 0
	5/8: Roll all the cpuid asm into one __cpuid call

Hooks:
	4/8: Replace sensitive instructions with macros
	6/8: Make __FIXADDR_TOP variable to allow it to make space...
	7/8: Add a bootparameter to reserve high linear address...

8/8 "Put .note.* sections into a PT_NOTE segment in vmlinux" is mostly
here to shake out problems early.  It slightly changes the way the
vmlinux image is linked together, and it uses the somewhat esoteric
PHDRS command in vmlinux.lds.  I want to make sure that this doesn't
provoke any problems in the various binutils people are using.

Thanks,
	J


^ permalink raw reply	[flat|nested] 43+ messages in thread
* Re: [patch 3/8] Allow a kernel to not be in ring 0.
@ 2006-08-05  0:41 Chuck Ebbert
  2006-08-05  4:26 ` Zachary Amsden
  0 siblings, 1 reply; 43+ messages in thread
From: Chuck Ebbert @ 2006-08-05  0:41 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jeremy Fitzhardinge, linux-kernel, virtualization, Xen-devel,
	Rusty Russell, Zachary Amsden

In-Reply-To: <20060803002518.190834642@xensource.com>

On Wed, 02 Aug 2006 17:25:13 -0700, Jeremy Fitzhardinge wrote:

> We allow for the fact that the guest kernel may not run in ring 0.
> This requires some abstraction in a few places when setting %cs or
> checking privilege level (user vs kernel).

I made some changes:

a. Added some comments about the SEGMENT_IS_*_CODE() macros.
b. Added a USER_RPL macro.  (You were comparing a value to a mask
   in some places and to the magic number 3 in other places.)
c. Changed the entry.S tests for LDT stack segment to use the macros.



From: Jeremy Fitzhardinge <jeremy@xensource.com>

We allow for the fact that the guest kernel may not run in ring 0.
This requires some abstraction in a few places when setting %cs or
checking privilege level (user vs kernel).

This is Chris' [RFC PATCH 15/33] move segment checks to subarch,
except rather than using #define USER_MODE_MASK which depends on a
config option, we use Zach's more flexible approach of assuming ring 3
== userspace.  I also used "get_kernel_rpl()" over "get_kernel_cs()"
because I think it reads better in the code...

1) Remove the hardcoded 3 and introduce #define SEGMENT_RPL_MASK 3
2) Add a get_kernel_rpl() macro, and don't assume it's zero.
3) Use USER_RPL macro instead of hardcoded 3

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>

---

 arch/i386/kernel/entry.S   |    9 +++++----
 arch/i386/kernel/process.c |    2 +-
 arch/i386/mm/extable.c     |    2 +-
 arch/i386/mm/fault.c       |   11 ++++-------
 include/asm-i386/ptrace.h  |    5 +++--
 include/asm-i386/segment.h |   12 ++++++++++++
 6 files changed, 26 insertions(+), 15 deletions(-)

--- 2.6.18-rc3-32.orig/arch/i386/kernel/entry.S
+++ 2.6.18-rc3-32/arch/i386/kernel/entry.S
@@ -229,8 +229,9 @@ ret_from_intr:
 check_userspace:
 	movl EFLAGS(%esp), %eax		# mix EFLAGS and CS
 	movb CS(%esp), %al
-	testl $(VM_MASK | 3), %eax
-	jz resume_kernel
+	andl $(VM_MASK | SEGMENT_RPL_MASK), %eax
+	cmpl $USER_RPL, %eax
+	jb resume_kernel		# not returning to v8086 or userspace
 ENTRY(resume_userspace)
  	cli				# make sure we don't miss an interrupt
 					# setting need_resched or sigpending
@@ -367,8 +368,8 @@ restore_all:
 	# See comments in process.c:copy_thread() for details.
 	movb OLDSS(%esp), %ah
 	movb CS(%esp), %al
-	andl $(VM_MASK | (4 << 8) | 3), %eax
-	cmpl $((4 << 8) | 3), %eax
+	andl $(VM_MASK | (4 << 8) | SEGMENT_RPL_MASK), %eax
+	cmpl $((4 << 8) | USER_RPL), %eax
 	CFI_REMEMBER_STATE
 	je ldt_ss			# returning to user-space with LDT SS
 restore_nocheck:
--- 2.6.18-rc3-32.orig/arch/i386/kernel/process.c
+++ 2.6.18-rc3-32/arch/i386/kernel/process.c
@@ -346,7 +346,7 @@ int kernel_thread(int (*fn)(void *), voi
 	regs.xes = __USER_DS;
 	regs.orig_eax = -1;
 	regs.eip = (unsigned long) kernel_thread_helper;
-	regs.xcs = __KERNEL_CS;
+	regs.xcs = __KERNEL_CS | get_kernel_rpl();
 	regs.eflags = X86_EFLAGS_IF | X86_EFLAGS_SF | X86_EFLAGS_PF | 0x2;
 
 	/* Ok, create the new process.. */
--- 2.6.18-rc3-32.orig/arch/i386/mm/extable.c
+++ 2.6.18-rc3-32/arch/i386/mm/extable.c
@@ -11,7 +11,7 @@ int fixup_exception(struct pt_regs *regs
 	const struct exception_table_entry *fixup;
 
 #ifdef CONFIG_PNPBIOS
-	if (unlikely((regs->xcs & ~15) == (GDT_ENTRY_PNPBIOS_BASE << 3)))
+	if (unlikely(SEGMENT_IS_PNP_CODE(regs->xcs)))
 	{
 		extern u32 pnp_bios_fault_eip, pnp_bios_fault_esp;
 		extern u32 pnp_bios_is_utter_crap;
--- 2.6.18-rc3-32.orig/arch/i386/mm/fault.c
+++ 2.6.18-rc3-32/arch/i386/mm/fault.c
@@ -27,6 +27,7 @@
 #include <asm/uaccess.h>
 #include <asm/desc.h>
 #include <asm/kdebug.h>
+#include <asm/segment.h>
 
 extern void die(const char *,struct pt_regs *,long);
 
@@ -119,10 +120,10 @@ static inline unsigned long get_segment_
 	}
 
 	/* The standard kernel/user address space limit. */
-	*eip_limit = (seg & 3) ? USER_DS.seg : KERNEL_DS.seg;
+	*eip_limit = user_mode(regs) ? USER_DS.seg : KERNEL_DS.seg;
 	
 	/* By far the most common cases. */
-	if (likely(seg == __USER_CS || seg == __KERNEL_CS))
+	if (likely(SEGMENT_IS_FLAT_CODE(seg)))
 		return eip;
 
 	/* Check the segment exists, is within the current LDT/GDT size,
@@ -436,11 +437,7 @@ good_area:
 	write = 0;
 	switch (error_code & 3) {
 		default:	/* 3: write, present */
-#ifdef TEST_VERIFY_AREA
-			if (regs->cs == KERNEL_CS)
-				printk("WP fault at %08lx\n", regs->eip);
-#endif
-			/* fall through */
+				/* fall through */
 		case 2:		/* write, not present */
 			if (!(vma->vm_flags & VM_WRITE))
 				goto bad_area;
--- 2.6.18-rc3-32.orig/include/asm-i386/ptrace.h
+++ 2.6.18-rc3-32/include/asm-i386/ptrace.h
@@ -60,6 +60,7 @@ struct pt_regs {
 #ifdef __KERNEL__
 
 #include <asm/vm86.h>
+#include <asm/segment.h>
 
 struct task_struct;
 extern void send_sigtrap(struct task_struct *tsk, struct pt_regs *regs, int error_code);
@@ -73,11 +74,11 @@ extern void send_sigtrap(struct task_str
  */
 static inline int user_mode(struct pt_regs *regs)
 {
-	return (regs->xcs & 3) != 0;
+	return (regs->xcs & SEGMENT_RPL_MASK) == USER_RPL;
 }
 static inline int user_mode_vm(struct pt_regs *regs)
 {
-	return ((regs->xcs & 3) | (regs->eflags & VM_MASK)) != 0;
+	return ((regs->xcs & SEGMENT_RPL_MASK) | (regs->eflags & VM_MASK)) >= USER_RPL;
 }
 #define instruction_pointer(regs) ((regs)->eip)
 #if defined(CONFIG_SMP) && defined(CONFIG_FRAME_POINTER)
--- 2.6.18-rc3-32.orig/include/asm-i386/segment.h
+++ 2.6.18-rc3-32/include/asm-i386/segment.h
@@ -83,6 +83,11 @@
 
 #define GDT_SIZE (GDT_ENTRIES * 8)
 
+/* Matches __KERNEL_CS and __USER_CS (they must be 2 entries apart) */
+#define SEGMENT_IS_FLAT_CODE(x)  (((x) & 0xec) == GDT_ENTRY_KERNEL_CS * 8)
+/* Matches PNP_CS32 and PNP_CS16 (they must be consecutive) */
+#define SEGMENT_IS_PNP_CODE(x)   (((x) & 0xf4) == GDT_ENTRY_PNPBIOS_BASE * 8)
+
 /* Simple and small GDT entries for booting only */
 
 #define GDT_ENTRY_BOOT_CS		2
@@ -112,4 +117,11 @@
  */
 #define IDT_ENTRIES 256
 
+/* Bottom three bits of xcs give the ring privilege level */
+#define SEGMENT_RPL_MASK	0x3
+
+/* User mode is privilege level 3 */
+#define USER_RPL		0x3
+
+#define get_kernel_rpl()  0
 #endif
-- 
Chuck

^ permalink raw reply	[flat|nested] 43+ messages in thread
* Re: [patch 3/8] Allow a kernel to not be in ring 0.
@ 2006-08-05  5:40 Chuck Ebbert
  0 siblings, 0 replies; 43+ messages in thread
From: Chuck Ebbert @ 2006-08-05  5:40 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Rusty Russell, Xen-devel, virtualization, linux-kernel,
	Jeremy Fitzhardinge, Jeremy Fitzhardinge

In-Reply-To: <44D41DEF.7040301@vmware.com>

On Fri, 04 Aug 2006 21:26:23 -0700, Zachary Amsden wrote:
>
> These changes look great.  Ack-ed.

I re-did it as a patch on top of the original so it's easier to see
what I changed.  Also added macros for table indicator.




Clean up of patch for letting kernel run other than ring 0:

a. Add some comments about the SEGMENT_IS_*_CODE() macros.
b. Add a USER_RPL macro.  (Code was comparing a value to a mask
   in some places and to the magic number 3 in other places.)
c. Add macros for table indicator field and use them.
d. Change the entry.S tests for LDT stack segment to use the macros.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Acked-by: Zachary Amsden <zach@vmware.com>

 arch/i386/kernel/entry.S   |    6 +++---
 include/asm-i386/ptrace.h  |    4 ++--
 include/asm-i386/segment.h |   17 ++++++++++++-----
 3 files changed, 17 insertions(+), 10 deletions(-)

--- 2.6.18-rc3-32.orig/arch/i386/kernel/entry.S
+++ 2.6.18-rc3-32/arch/i386/kernel/entry.S
@@ -230,7 +230,7 @@ check_userspace:
 	movl EFLAGS(%esp), %eax		# mix EFLAGS and CS
 	movb CS(%esp), %al
 	andl $(VM_MASK | SEGMENT_RPL_MASK), %eax
-	cmpl $SEGMENT_RPL_MASK, %eax
+	cmpl $USER_RPL, %eax
 	jb resume_kernel		# not returning to v8086 or userspace
 ENTRY(resume_userspace)
  	cli				# make sure we don't miss an interrupt
@@ -368,8 +368,8 @@ restore_all:
 	# See comments in process.c:copy_thread() for details.
 	movb OLDSS(%esp), %ah
 	movb CS(%esp), %al
-	andl $(VM_MASK | (4 << 8) | 3), %eax
-	cmpl $((4 << 8) | 3), %eax
+	andl $(VM_MASK | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
+	cmpl $((SEGMENT_LDT << 8) | USER_RPL), %eax
 	CFI_REMEMBER_STATE
 	je ldt_ss			# returning to user-space with LDT SS
 restore_nocheck:
--- 2.6.18-rc3-32.orig/include/asm-i386/ptrace.h
+++ 2.6.18-rc3-32/include/asm-i386/ptrace.h
@@ -74,11 +74,11 @@ extern void send_sigtrap(struct task_str
  */
 static inline int user_mode(struct pt_regs *regs)
 {
-	return (regs->xcs & SEGMENT_RPL_MASK) == 3;
+	return (regs->xcs & SEGMENT_RPL_MASK) == USER_RPL;
 }
 static inline int user_mode_vm(struct pt_regs *regs)
 {
-	return (((regs->xcs & SEGMENT_RPL_MASK) | (regs->eflags & VM_MASK)) >= 3);
+	return ((regs->xcs & SEGMENT_RPL_MASK) | (regs->eflags & VM_MASK)) >= USER_RPL;
 }
 #define instruction_pointer(regs) ((regs)->eip)
 #if defined(CONFIG_SMP) && defined(CONFIG_FRAME_POINTER)
--- 2.6.18-rc3-32.orig/include/asm-i386/segment.h
+++ 2.6.18-rc3-32/include/asm-i386/segment.h
@@ -83,10 +83,9 @@
 
 #define GDT_SIZE (GDT_ENTRIES * 8)
 
-/*
- * Some tricky tests to match code segments after a fault
- */
+/* Matches __KERNEL_CS and __USER_CS (they must be 2 entries apart) */
 #define SEGMENT_IS_FLAT_CODE(x)  (((x) & 0xec) == GDT_ENTRY_KERNEL_CS * 8)
+/* Matches PNP_CS32 and PNP_CS16 (they must be consecutive) */
 #define SEGMENT_IS_PNP_CODE(x)   (((x) & 0xf4) == GDT_ENTRY_PNPBIOS_BASE * 8)
 
 /* Simple and small GDT entries for booting only */
@@ -118,8 +117,16 @@
  */
 #define IDT_ENTRIES 256
 
-/* Bottom three bits of xcs give the ring privilege level */
-#define SEGMENT_RPL_MASK 0x3
+/* Bottom two bits of selector give the ring privilege level */
+#define SEGMENT_RPL_MASK	0x3
+/* Bit 2 is table indicator (LDT/GDT) */
+#define SEGMENT_TI_MASK		0x4
+
+/* User mode is privilege level 3 */
+#define USER_RPL		0x3
+/* LDT segment has TI set, GDT has it cleared */
+#define SEGMENT_LDT		0x4
+#define SEGMENT_GDT		0x0
 
 #define get_kernel_rpl()  0
 #endif
-- 
Chuck

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2010-05-04 23:37 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-03  0:25 [patch 0/8] Basic infrastructure patches for a paravirtualized kernel Jeremy Fitzhardinge
2006-08-03  0:25 ` [patch 1/8] Remove locally-defined ldt structure in favour of standard type Jeremy Fitzhardinge
2006-08-03  0:25 ` [patch 2/8] Implement always-locked bit ops, for memory shared with an SMP hypervisor Jeremy Fitzhardinge
2006-08-03  0:28   ` Christoph Lameter
2006-08-03  0:35     ` Jeremy Fitzhardinge
2006-08-03  1:06       ` Christoph Lameter
2006-08-03  1:18         ` Zachary Amsden
2006-08-03  1:25           ` Christoph Lameter
2006-08-03  3:55             ` Andi Kleen
2006-08-03  4:25               ` Christoph Lameter
2006-08-03  4:47                 ` Andi Kleen
2006-08-03  2:45         ` Andi Kleen
2006-08-03  4:27           ` Christoph Lameter
2006-08-03  4:49             ` Andi Kleen
2006-08-03  5:19               ` Christoph Lameter
2006-08-03  5:25                 ` Andi Kleen
2006-08-03  5:32                   ` Christoph Lameter
2006-08-03  5:39                     ` Andi Kleen
2006-08-03  5:54                       ` Christoph Lameter
2006-08-03  6:02                         ` Andi Kleen
2006-08-03 16:49                           ` Christoph Lameter
2006-08-03 17:18                             ` Chris Wright
2006-08-04  0:47                             ` Andi Kleen
2006-08-04  2:16                               ` Christoph Lameter
2006-08-03  0:25 ` [patch 3/8] Allow a kernel to not be in ring 0 Jeremy Fitzhardinge
2006-08-03  0:25 ` [patch 4/8] Replace sensitive instructions with macros Jeremy Fitzhardinge
2006-08-03  0:25 ` [patch 5/8] Roll all the cpuid asm into one __cpuid call Jeremy Fitzhardinge
2006-08-03  0:25 ` [patch 6/8] Make __FIXADDR_TOP variable to allow it to make space for a hypervisor Jeremy Fitzhardinge
2006-08-03  0:25 ` [patch 7/8] Add a bootparameter to reserve high linear address space Jeremy Fitzhardinge
1970-01-01  0:15   ` Pavel Machek
2006-08-07  2:10     ` Andi Kleen
2010-05-04 23:37     ` Jeremy Fitzhardinge
2006-08-03  6:19   ` Andrew Morton
2006-08-03  7:33     ` Zachary Amsden
2006-08-03  7:41       ` Andrew Morton
2006-08-03  8:58         ` Zachary Amsden
2006-08-05 21:58           ` Andrew Morton
2006-08-05 22:52             ` Zachary Amsden
2006-08-05 23:17             ` Rusty Russell
2006-08-03  0:25 ` [patch 8/8] Put .note.* sections into a PT_NOTE segment in vmlinux Jeremy Fitzhardinge
  -- strict thread matches above, loose matches on Subject: below --
2006-08-05  0:41 [patch 3/8] Allow a kernel to not be in ring 0 Chuck Ebbert
2006-08-05  4:26 ` Zachary Amsden
2006-08-05  5:40 Chuck Ebbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox