public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] axe the store_gdt() pvops call. (v1)
@ 2013-04-05 20:42 Konrad Rzeszutek Wilk
  2013-04-05 20:42 ` [PATCH 1/4] x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed Konrad Rzeszutek Wilk
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-04-05 20:42 UTC (permalink / raw)
  To: linux-kernel, xen-devel, x86, hpa, konrad

Long long time ago (way back in October 2012), when I posted the patches
that would make it possible to do ACPI S3 with Xen, Peter pointed out that:
"excellent set of pvops calls that should be nukable to Kingdom Come. There
is no reason, ever, to read the IDT and GDT from the kernel... the kernel
already knows what they should be!"

http://lkml.indiana.edu/hypermail/linux/kernel/1210.2/01555.html

Merge windows happens, bugs happen, and only this week I was able to carve
out some time to dig a bit in this. I started with the GDT and found
out that we can remove it. My fear was that ACPI S3 would break but fortunatly
it has its own mechanism for reloading the GDT (and as the 32-bit
patch shows - it has an redundant one as well!). Tested on 32 bit and
64-bit kernels - doing ACPI S3 and hibernate as well. The machines
were ThinkPad T61 and an Asus M5A97.

This RFC patch does the removal of the store_gdt() and as well some of
the 32-bit ACPI S3 code. Please review at your leisure.

The patches are also visible at:
 
 git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/for-hpa-3.10

 arch/x86/include/asm/paravirt.h       |  4 ----
 arch/x86/include/asm/paravirt_types.h |  2 +-
 arch/x86/include/asm/suspend_32.h     |  1 -
 arch/x86/include/asm/suspend_64.h     |  3 ---
 arch/x86/kernel/acpi/sleep.c          |  2 +-
 arch/x86/kernel/acpi/wakeup_32.S      |  3 ---
 arch/x86/kernel/doublefault_32.c      |  2 +-
 arch/x86/kernel/paravirt.c            |  1 -
 arch/x86/kvm/vmx.c                    |  2 +-
 arch/x86/power/cpu.c                  | 13 +++++++------
 arch/x86/xen/enlighten.c              |  1 -
 11 files changed, 11 insertions(+), 23 deletions(-)

Konrad Rzeszutek Wilk (3):
      x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed.
      x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is not needed
      x86/xen/store_gdt: Remove the pvops variant of store_gdt.

konrad@kernel.org (1):
      x86/wakeup/sleep:  Use pvops functions for changing GDT entries.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed.
  2013-04-05 20:42 [RFC PATCH] axe the store_gdt() pvops call. (v1) Konrad Rzeszutek Wilk
@ 2013-04-05 20:42 ` Konrad Rzeszutek Wilk
  2013-04-11 22:59   ` [tip:x86/paravirt] x86-64, gdt: Store/ load " tip-bot for Konrad Rzeszutek Wilk
  2013-04-05 20:42 ` [PATCH 2/4] x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume " Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-04-05 20:42 UTC (permalink / raw)
  To: linux-kernel, xen-devel, x86, hpa, konrad; +Cc: Konrad Rzeszutek Wilk

During the ACPI S3 resume path the trampoline code handles it already.

During the ACPI S3 suspend phase (acpi_suspend_lowlevel) we set:
early_gdt_descr.address = (..)get_cpu_gdt_table(smp_processor_id());

which is then used during the resume path and has the same exact
value as what the store/load_gdt do with the saved_context
(which is saved/restored via save/restore_processor_state()).

The flow during resume is complex and for 64-bit kernels we use three GDTs
- one early bootstrap GDT (wakeup_igdt) that we load to workaround
broken BIOSes, an early Protected Mode to Long Mode transition one
(tr_gdt), and the final one - early_gdt_descr (which points to the real GDT).

The early ('wakeup_gdt') is loaded in 'trampoline_start' for working
around broken BIOSes, and then when we end up in Protected Mode in the
startup_32 (in trampoline_64.s, not head_32.s) we use the 'tr_gdt'
(still in trampoline_64.s). This 'tr_gdt' has a a 32-bit code segment,
64-bit code segment with L=1, and a 32-bit data segment.

Once we have transitioned from Protected Mode to Long Mode we then
set the GDT to 'early_gdt_desc' and then via an iretq emerge in
wakeup_long64 (set via 'initial_code' variable in acpi_suspend_lowlevel).

In the wakeup_long64 we end up restoring the %rip (which is set to
'resume_point') and jump there.

In 'resume_point' we call 'restore_processor_state' which does
the load_gdt on the saved context. This load_gdt is redundant as the
GDT loaded via early_gdt_desc is the same.

Here is the call-chain:
 wakeup_start
   |- lgdtl wakeup_gdt [the work-around broken BIOSes]
   |
   \-- trampoline_start (trampoline_64.S)
         |- lgdtl tr_gdt
         |
         \-- startup_32 (trampoline_64.S)
               |
               \-- startup_64 (trampoline_64.S)
                      |
                      \-- secondary_startup_64
                               |- lgdtl early_gdt_desc
                               | ...
                               |- movq initial_code(%rip), %eax
                               |-.. lretq
                               \-- wakeup_64
                                     |-- other registers are reloaded
                                     |-- call restore_processor_state

The hibernate path is much simpler. During the saving of the hibernation
image we call save_processor_state() and save the contents of that along
with the rest of the kernel in the hibernation image destination.
We save the EIP of 'restore_registers' (restore_jump_address) and cr3
(restore_cr3).

During hibernate resume, the 'restore_registers' (via the
'restore_jump_address) in hibernate_asm_64.S is invoked which restores
the contents of most registers. Naturally the resume path benefits from
already being in 64-bit mode, so it does not have to load the GDT.

It only reloads the cr3 (from restore_cr3) and continues on. Note that
the restoration of the restore image page-tables is done prior to this.

After the 'restore_registers' it returns and we end up called
restore_processor_state() - where we reload the GDT. The reload of
the GDT is not needed as bootup kernel has already loaded the GDT which
is at the same physical location as the the restored kernel.

Note that the hibernation path assumes the GDT is correct during its
'restore_registers'. The assumption in the code is that the restored
image is the same as saved - meaning we are not trying to restore
an different kernel in the virtual address space of a new kernel.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/suspend_64.h | 3 ---
 arch/x86/power/cpu.c              | 2 --
 2 files changed, 5 deletions(-)

diff --git a/arch/x86/include/asm/suspend_64.h b/arch/x86/include/asm/suspend_64.h
index 09b0bf1..97b84e0 100644
--- a/arch/x86/include/asm/suspend_64.h
+++ b/arch/x86/include/asm/suspend_64.h
@@ -25,9 +25,6 @@ struct saved_context {
 	u64 misc_enable;
 	bool misc_enable_saved;
 	unsigned long efer;
-	u16 gdt_pad;
-	u16 gdt_limit;
-	unsigned long gdt_base;
 	u16 idt_pad;
 	u16 idt_limit;
 	unsigned long idt_base;
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 3c68768..fdca260 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -66,7 +66,6 @@ static void __save_processor_state(struct saved_context *ctxt)
 	store_idt(&ctxt->idt);
 #else
 /* CONFIG_X86_64 */
-	store_gdt((struct desc_ptr *)&ctxt->gdt_limit);
 	store_idt((struct desc_ptr *)&ctxt->idt_limit);
 #endif
 	store_tr(ctxt->tr);
@@ -187,7 +186,6 @@ static void __restore_processor_state(struct saved_context *ctxt)
 	load_idt(&ctxt->idt);
 #else
 /* CONFIG_X86_64 */
-	load_gdt((const struct desc_ptr *)&ctxt->gdt_limit);
 	load_idt((const struct desc_ptr *)&ctxt->idt_limit);
 #endif
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is not needed
  2013-04-05 20:42 [RFC PATCH] axe the store_gdt() pvops call. (v1) Konrad Rzeszutek Wilk
  2013-04-05 20:42 ` [PATCH 1/4] x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed Konrad Rzeszutek Wilk
@ 2013-04-05 20:42 ` Konrad Rzeszutek Wilk
  2013-04-11 23:00   ` [tip:x86/paravirt] x86-32, gdt: Store/ load " tip-bot for Konrad Rzeszutek Wilk
  2013-04-05 20:42 ` [PATCH 3/4] x86/xen/store_gdt: Remove the pvops variant of store_gdt Konrad Rzeszutek Wilk
  2013-04-05 20:42 ` [PATCH 4/4] x86/wakeup/sleep: Use pvops functions for changing GDT entries Konrad Rzeszutek Wilk
  3 siblings, 1 reply; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-04-05 20:42 UTC (permalink / raw)
  To: linux-kernel, xen-devel, x86, hpa, konrad; +Cc: Konrad Rzeszutek Wilk

During the ACPI S3 suspend, we store the GDT in the wakup_header (see
wakeup_asm.s) field called 'pmode_gdt'.

Which is then used during the resume path and has the same exact
value as what the store/load_gdt do with the saved_context
(which is saved/restored via save/restore_processor_state()).

The flow during resume from ACPI S3 is simpler than the 64-bit
counterpart. We only use the early bootstrap once (wakeup_gdt) and
do various checks in real mode.

After the checks are completed, we load the saved GDT ('pmode_gdt') and
continue on with the resume (by heading to startup_32 in trampoline_32.S) -
which quickly jumps to what was saved in 'pmode_entry'
aka 'wakeup_pmode_return'.

The 'wakeup_pmode_return' restores the GDT (saved_gdt) again (which was
saved in do_suspend_lowlevel initially). After that it ends up calling
the 'ret_point' which calls 'restore_processor_state()'.

We have two opportunities to remove code where we restore the same GDT
twice.

Here is the call chain:
 wakeup_start
       |- lgdtl wakeup_gdt [the work-around broken BIOSes]
       |
       | - lgdtl pmode_gdt [the real one]
       |
       \-- startup_32 (in trampoline_32.S)
              \-- wakeup_pmode_return (in wakeup_32.S)
                       |- lgdtl saved_gdt [the real one]
                       \-- ret_point
                             |..
                             |- call restore_processor_state

The hibernate path is much simpler. During the saving of the hibernation
image we call save_processor_state() and save the contents of that
along with the rest of the kernel in the hibernation image destination.
We save the EIP of 'restore_registers' (restore_jump_address) and
cr3 (restore_cr3).

During hibernate resume, the 'restore_registers' (via the
'restore_jump_address) in hibernate_asm_32.S is invoked which
restores the contents of most registers. Naturally the resume path benefits
from already being in 32-bit mode, so it does not have to reload the GDT.

It only reloads the cr3 (from restore_cr3) and continues on. Note
that the restoration of the restore image page-tables is done prior to
this.

After the 'restore_registers' it returns and we end up called
restore_processor_state() - where we reload the GDT. The reload of
the GDT is not needed as bootup kernel has already loaded the GDT
which is at the same physical location as the the restored kernel.

Note that the hibernation path assumes the GDT is correct during its
'restore_registers'. The assumption in the code is that the restored
image is the same as saved - meaning we are not trying to restore
an different kernel in the virtual address space of a new kernel.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/suspend_32.h | 1 -
 arch/x86/kernel/acpi/wakeup_32.S  | 3 ---
 arch/x86/power/cpu.c              | 2 --
 3 files changed, 6 deletions(-)

diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
index 487055c..f6064b7 100644
--- a/arch/x86/include/asm/suspend_32.h
+++ b/arch/x86/include/asm/suspend_32.h
@@ -15,7 +15,6 @@ struct saved_context {
 	unsigned long cr0, cr2, cr3, cr4;
 	u64 misc_enable;
 	bool misc_enable_saved;
-	struct desc_ptr gdt;
 	struct desc_ptr idt;
 	u16 ldt;
 	u16 tss;
diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
index 13ab720..91adb1b 100644
--- a/arch/x86/kernel/acpi/wakeup_32.S
+++ b/arch/x86/kernel/acpi/wakeup_32.S
@@ -18,7 +18,6 @@ wakeup_pmode_return:
 	movw	%ax, %gs
 
 	# reload the gdt, as we need the full 32 bit address
-	lgdt	saved_gdt
 	lidt	saved_idt
 	lldt	saved_ldt
 	ljmp	$(__KERNEL_CS), $1f
@@ -44,7 +43,6 @@ bogus_magic:
 
 
 save_registers:
-	sgdt	saved_gdt
 	sidt	saved_idt
 	sldt	saved_ldt
 	str	saved_tss
@@ -93,7 +91,6 @@ ENTRY(saved_magic)	.long	0
 ENTRY(saved_eip)	.long	0
 
 # saved registers
-saved_gdt:	.long	0,0
 saved_idt:	.long	0,0
 saved_ldt:	.long	0
 saved_tss:	.long	0
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index fdca260..571176f 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -62,7 +62,6 @@ static void __save_processor_state(struct saved_context *ctxt)
 	 * descriptor tables
 	 */
 #ifdef CONFIG_X86_32
-	store_gdt(&ctxt->gdt);
 	store_idt(&ctxt->idt);
 #else
 /* CONFIG_X86_64 */
@@ -182,7 +181,6 @@ static void __restore_processor_state(struct saved_context *ctxt)
 	 * ltr is done i fix_processor_context().
 	 */
 #ifdef CONFIG_X86_32
-	load_gdt(&ctxt->gdt);
 	load_idt(&ctxt->idt);
 #else
 /* CONFIG_X86_64 */
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] x86/xen/store_gdt: Remove the pvops variant of store_gdt.
  2013-04-05 20:42 [RFC PATCH] axe the store_gdt() pvops call. (v1) Konrad Rzeszutek Wilk
  2013-04-05 20:42 ` [PATCH 1/4] x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed Konrad Rzeszutek Wilk
  2013-04-05 20:42 ` [PATCH 2/4] x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume " Konrad Rzeszutek Wilk
@ 2013-04-05 20:42 ` Konrad Rzeszutek Wilk
  2013-04-11 23:01   ` [tip:x86/paravirt] x86, xen, gdt: " tip-bot for Konrad Rzeszutek Wilk
  2013-04-05 20:42 ` [PATCH 4/4] x86/wakeup/sleep: Use pvops functions for changing GDT entries Konrad Rzeszutek Wilk
  3 siblings, 1 reply; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-04-05 20:42 UTC (permalink / raw)
  To: linux-kernel, xen-devel, x86, hpa, konrad; +Cc: Konrad Rzeszutek Wilk

The two use-cases where we needed to store the GDT were during ACPI S3 suspend
and resume. As the patches:
 x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is not needed
 x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed.

have demonstrated - there are other mechanism by which the GDT is
saved and reloaded during early resume path.

Hence we do not need to worry about the pvops call-chain for saving the
GDT and can and can eliminate it. The other areas where the store_gdt is
used are never going to be hit when running under the pvops platforms.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/paravirt.h       | 4 ----
 arch/x86/include/asm/paravirt_types.h | 2 +-
 arch/x86/kernel/acpi/sleep.c          | 2 +-
 arch/x86/kernel/doublefault_32.c      | 2 +-
 arch/x86/kernel/paravirt.c            | 1 -
 arch/x86/kvm/vmx.c                    | 2 +-
 arch/x86/xen/enlighten.c              | 1 -
 7 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 5edd174..7365128 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -262,10 +262,6 @@ static inline void set_ldt(const void *addr, unsigned entries)
 {
 	PVOP_VCALL2(pv_cpu_ops.set_ldt, addr, entries);
 }
-static inline void store_gdt(struct desc_ptr *dtr)
-{
-	PVOP_VCALL1(pv_cpu_ops.store_gdt, dtr);
-}
 static inline void store_idt(struct desc_ptr *dtr)
 {
 	PVOP_VCALL1(pv_cpu_ops.store_idt, dtr);
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 142236e..b6c69a6 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -122,7 +122,7 @@ struct pv_cpu_ops {
 	void (*load_tr_desc)(void);
 	void (*load_gdt)(const struct desc_ptr *);
 	void (*load_idt)(const struct desc_ptr *);
-	void (*store_gdt)(struct desc_ptr *);
+	/* store_gdt has been removed. */
 	void (*store_idt)(struct desc_ptr *);
 	void (*set_ldt)(const void *desc, unsigned entries);
 	unsigned long (*store_tr)(void);
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index 0532f5d..b44577b 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -46,7 +46,7 @@ int acpi_suspend_lowlevel(void)
 	header->pmode_behavior = 0;
 
 #ifndef CONFIG_64BIT
-	store_gdt((struct desc_ptr *)&header->pmode_gdt);
+	native_store_gdt((struct desc_ptr *)&header->pmode_gdt);
 
 	if (!rdmsr_safe(MSR_EFER,
 			&header->pmode_efer_low,
diff --git a/arch/x86/kernel/doublefault_32.c b/arch/x86/kernel/doublefault_32.c
index 37250fe..155a13f 100644
--- a/arch/x86/kernel/doublefault_32.c
+++ b/arch/x86/kernel/doublefault_32.c
@@ -20,7 +20,7 @@ static void doublefault_fn(void)
 	struct desc_ptr gdt_desc = {0, 0};
 	unsigned long gdt, tss;
 
-	store_gdt(&gdt_desc);
+	native_store_gdt(&gdt_desc);
 	gdt = gdt_desc.address;
 
 	printk(KERN_EMERG "PANIC: double fault, gdt at %08lx [%d bytes]\n", gdt, gdt_desc.size);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 17fff18..4ae3d23 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -360,7 +360,6 @@ struct pv_cpu_ops pv_cpu_ops = {
 	.set_ldt = native_set_ldt,
 	.load_gdt = native_load_gdt,
 	.load_idt = native_load_idt,
-	.store_gdt = native_store_gdt,
 	.store_idt = native_store_idt,
 	.store_tr = native_store_tr,
 	.load_tls = native_load_tls,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6667042..867b810 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2459,7 +2459,7 @@ static int hardware_enable(void *garbage)
 		ept_sync_global();
 	}
 
-	store_gdt(&__get_cpu_var(host_gdt));
+	native_store_gdt(&__get_cpu_var(host_gdt));
 
 	return 0;
 }
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c8e1c7b..ddab539 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1220,7 +1220,6 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = {
 	.alloc_ldt = xen_alloc_ldt,
 	.free_ldt = xen_free_ldt,
 
-	.store_gdt = native_store_gdt,
 	.store_idt = native_store_idt,
 	.store_tr = xen_store_tr,
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] x86/wakeup/sleep:  Use pvops functions for changing GDT entries.
  2013-04-05 20:42 [RFC PATCH] axe the store_gdt() pvops call. (v1) Konrad Rzeszutek Wilk
                   ` (2 preceding siblings ...)
  2013-04-05 20:42 ` [PATCH 3/4] x86/xen/store_gdt: Remove the pvops variant of store_gdt Konrad Rzeszutek Wilk
@ 2013-04-05 20:42 ` Konrad Rzeszutek Wilk
  2013-04-11 23:02   ` [tip:x86/paravirt] x86, wakeup, sleep: " tip-bot for konrad@kernel.org
  3 siblings, 1 reply; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-04-05 20:42 UTC (permalink / raw)
  To: linux-kernel, xen-devel, x86, hpa, konrad; +Cc: Konrad Rzeszutek Wilk

From: "konrad@kernel.org" <konrad@kernel.org>

We check the TSS descriptor before we try to dereference it.
Also we document what the value '9' actually means using the
AMD64 Architecture Programmer's Manual Volume 2, pg 90:
"Hex value 9: Available 64-bit TSS" and pg 91:
"The available 32-bit TSS (09h), which is redefined as the
available 64-bit TSS."

Without this, on Xen, where the GDT is available as R/O (to
protect the hypervisor from the guest modifying it), we end up
with a pagetable fault.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/power/cpu.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 571176f..6d6e907 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -133,7 +133,10 @@ static void fix_processor_context(void)
 {
 	int cpu = smp_processor_id();
 	struct tss_struct *t = &per_cpu(init_tss, cpu);
-
+#ifdef CONFIG_X86_64
+	struct desc_struct *desc = get_cpu_gdt_table(cpu);
+	tss_desc tss;
+#endif
 	set_tss_desc(cpu, t);	/*
 				 * This just modifies memory; should not be
 				 * necessary. But... This is necessary, because
@@ -142,7 +145,9 @@ static void fix_processor_context(void)
 				 */
 
 #ifdef CONFIG_X86_64
-	get_cpu_gdt_table(cpu)[GDT_ENTRY_TSS].type = 9;
+	memcpy(&tss, &desc[GDT_ENTRY_TSS], sizeof(tss_desc));
+	tss.type = 0x9; /* The available 64-bit TSS (see AMD vol 2, pg 91 */
+	write_gdt_entry(desc, GDT_ENTRY_TSS, &tss, DESC_TSS);
 
 	syscall_init();				/* This sets MSR_*STAR and related */
 #endif
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:x86/paravirt] x86-64, gdt: Store/ load GDT for ACPI S3 or hibernate/resume path is not needed.
  2013-04-05 20:42 ` [PATCH 1/4] x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed Konrad Rzeszutek Wilk
@ 2013-04-11 22:59   ` tip-bot for Konrad Rzeszutek Wilk
  2013-04-12 12:02     ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: tip-bot for Konrad Rzeszutek Wilk @ 2013-04-11 22:59 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, konrad.wilk, tglx, hpa, rjw

Commit-ID:  e7a5cd063c7b4c58417f674821d63f5eb6747e37
Gitweb:     http://git.kernel.org/tip/e7a5cd063c7b4c58417f674821d63f5eb6747e37
Author:     Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
AuthorDate: Fri, 5 Apr 2013 16:42:21 -0400
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Thu, 11 Apr 2013 15:39:38 -0700

x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed.

During the ACPI S3 resume path the trampoline code handles it already.

During the ACPI S3 suspend phase (acpi_suspend_lowlevel) we set:
early_gdt_descr.address = (..)get_cpu_gdt_table(smp_processor_id());

which is then used during the resume path and has the same exact
value as what the store/load_gdt do with the saved_context
(which is saved/restored via save/restore_processor_state()).

The flow during resume is complex and for 64-bit kernels we use three GDTs
- one early bootstrap GDT (wakeup_igdt) that we load to workaround
broken BIOSes, an early Protected Mode to Long Mode transition one
(tr_gdt), and the final one - early_gdt_descr (which points to the real GDT).

The early ('wakeup_gdt') is loaded in 'trampoline_start' for working
around broken BIOSes, and then when we end up in Protected Mode in the
startup_32 (in trampoline_64.s, not head_32.s) we use the 'tr_gdt'
(still in trampoline_64.s). This 'tr_gdt' has a a 32-bit code segment,
64-bit code segment with L=1, and a 32-bit data segment.

Once we have transitioned from Protected Mode to Long Mode we then
set the GDT to 'early_gdt_desc' and then via an iretq emerge in
wakeup_long64 (set via 'initial_code' variable in acpi_suspend_lowlevel).

In the wakeup_long64 we end up restoring the %rip (which is set to
'resume_point') and jump there.

In 'resume_point' we call 'restore_processor_state' which does
the load_gdt on the saved context. This load_gdt is redundant as the
GDT loaded via early_gdt_desc is the same.

Here is the call-chain:
 wakeup_start
   |- lgdtl wakeup_gdt [the work-around broken BIOSes]
   |
   \-- trampoline_start (trampoline_64.S)
         |- lgdtl tr_gdt
         |
         \-- startup_32 (trampoline_64.S)
               |
               \-- startup_64 (trampoline_64.S)
                      |
                      \-- secondary_startup_64
                               |- lgdtl early_gdt_desc
                               | ...
                               |- movq initial_code(%rip), %eax
                               |-.. lretq
                               \-- wakeup_64
                                     |-- other registers are reloaded
                                     |-- call restore_processor_state

The hibernate path is much simpler. During the saving of the hibernation
image we call save_processor_state() and save the contents of that along
with the rest of the kernel in the hibernation image destination.
We save the EIP of 'restore_registers' (restore_jump_address) and cr3
(restore_cr3).

During hibernate resume, the 'restore_registers' (via the
'restore_jump_address) in hibernate_asm_64.S is invoked which restores
the contents of most registers. Naturally the resume path benefits from
already being in 64-bit mode, so it does not have to load the GDT.

It only reloads the cr3 (from restore_cr3) and continues on. Note that
the restoration of the restore image page-tables is done prior to this.

After the 'restore_registers' it returns and we end up called
restore_processor_state() - where we reload the GDT. The reload of
the GDT is not needed as bootup kernel has already loaded the GDT which
is at the same physical location as the the restored kernel.

Note that the hibernation path assumes the GDT is correct during its
'restore_registers'. The assumption in the code is that the restored
image is the same as saved - meaning we are not trying to restore
an different kernel in the virtual address space of a new kernel.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1365194544-14648-2-git-send-email-konrad.wilk@oracle.com
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/suspend_64.h | 3 ---
 arch/x86/power/cpu.c              | 2 --
 2 files changed, 5 deletions(-)

diff --git a/arch/x86/include/asm/suspend_64.h b/arch/x86/include/asm/suspend_64.h
index 09b0bf1..97b84e0 100644
--- a/arch/x86/include/asm/suspend_64.h
+++ b/arch/x86/include/asm/suspend_64.h
@@ -25,9 +25,6 @@ struct saved_context {
 	u64 misc_enable;
 	bool misc_enable_saved;
 	unsigned long efer;
-	u16 gdt_pad;
-	u16 gdt_limit;
-	unsigned long gdt_base;
 	u16 idt_pad;
 	u16 idt_limit;
 	unsigned long idt_base;
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 120cee1..6bd9423 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -65,7 +65,6 @@ static void __save_processor_state(struct saved_context *ctxt)
 	store_idt(&ctxt->idt);
 #else
 /* CONFIG_X86_64 */
-	store_gdt((struct desc_ptr *)&ctxt->gdt_limit);
 	store_idt((struct desc_ptr *)&ctxt->idt_limit);
 #endif
 	store_tr(ctxt->tr);
@@ -186,7 +185,6 @@ static void __restore_processor_state(struct saved_context *ctxt)
 	load_idt(&ctxt->idt);
 #else
 /* CONFIG_X86_64 */
-	load_gdt((const struct desc_ptr *)&ctxt->gdt_limit);
 	load_idt((const struct desc_ptr *)&ctxt->idt_limit);
 #endif
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:x86/paravirt] x86-32, gdt: Store/ load GDT for ACPI S3 or hibernation/resume path is not needed
  2013-04-05 20:42 ` [PATCH 2/4] x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume " Konrad Rzeszutek Wilk
@ 2013-04-11 23:00   ` tip-bot for Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Konrad Rzeszutek Wilk @ 2013-04-11 23:00 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, konrad.wilk, tglx, hpa, rjw

Commit-ID:  84e70971e67d97bc2db18a4e76d42846272a54bd
Gitweb:     http://git.kernel.org/tip/84e70971e67d97bc2db18a4e76d42846272a54bd
Author:     Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
AuthorDate: Fri, 5 Apr 2013 16:42:22 -0400
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Thu, 11 Apr 2013 15:40:17 -0700

x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed

During the ACPI S3 suspend, we store the GDT in the wakup_header (see
wakeup_asm.s) field called 'pmode_gdt'.

Which is then used during the resume path and has the same exact
value as what the store/load_gdt do with the saved_context
(which is saved/restored via save/restore_processor_state()).

The flow during resume from ACPI S3 is simpler than the 64-bit
counterpart. We only use the early bootstrap once (wakeup_gdt) and
do various checks in real mode.

After the checks are completed, we load the saved GDT ('pmode_gdt') and
continue on with the resume (by heading to startup_32 in trampoline_32.S) -
which quickly jumps to what was saved in 'pmode_entry'
aka 'wakeup_pmode_return'.

The 'wakeup_pmode_return' restores the GDT (saved_gdt) again (which was
saved in do_suspend_lowlevel initially). After that it ends up calling
the 'ret_point' which calls 'restore_processor_state()'.

We have two opportunities to remove code where we restore the same GDT
twice.

Here is the call chain:
 wakeup_start
       |- lgdtl wakeup_gdt [the work-around broken BIOSes]
       |
       | - lgdtl pmode_gdt [the real one]
       |
       \-- startup_32 (in trampoline_32.S)
              \-- wakeup_pmode_return (in wakeup_32.S)
                       |- lgdtl saved_gdt [the real one]
                       \-- ret_point
                             |..
                             |- call restore_processor_state

The hibernate path is much simpler. During the saving of the hibernation
image we call save_processor_state() and save the contents of that
along with the rest of the kernel in the hibernation image destination.
We save the EIP of 'restore_registers' (restore_jump_address) and
cr3 (restore_cr3).

During hibernate resume, the 'restore_registers' (via the
'restore_jump_address) in hibernate_asm_32.S is invoked which
restores the contents of most registers. Naturally the resume path benefits
from already being in 32-bit mode, so it does not have to reload the GDT.

It only reloads the cr3 (from restore_cr3) and continues on. Note
that the restoration of the restore image page-tables is done prior to
this.

After the 'restore_registers' it returns and we end up called
restore_processor_state() - where we reload the GDT. The reload of
the GDT is not needed as bootup kernel has already loaded the GDT
which is at the same physical location as the the restored kernel.

Note that the hibernation path assumes the GDT is correct during its
'restore_registers'. The assumption in the code is that the restored
image is the same as saved - meaning we are not trying to restore
an different kernel in the virtual address space of a new kernel.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1365194544-14648-3-git-send-email-konrad.wilk@oracle.com
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/suspend_32.h | 1 -
 arch/x86/kernel/acpi/wakeup_32.S  | 3 ---
 arch/x86/power/cpu.c              | 2 --
 3 files changed, 6 deletions(-)

diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
index 487055c..f6064b7 100644
--- a/arch/x86/include/asm/suspend_32.h
+++ b/arch/x86/include/asm/suspend_32.h
@@ -15,7 +15,6 @@ struct saved_context {
 	unsigned long cr0, cr2, cr3, cr4;
 	u64 misc_enable;
 	bool misc_enable_saved;
-	struct desc_ptr gdt;
 	struct desc_ptr idt;
 	u16 ldt;
 	u16 tss;
diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
index 13ab720..91adb1b 100644
--- a/arch/x86/kernel/acpi/wakeup_32.S
+++ b/arch/x86/kernel/acpi/wakeup_32.S
@@ -18,7 +18,6 @@ wakeup_pmode_return:
 	movw	%ax, %gs
 
 	# reload the gdt, as we need the full 32 bit address
-	lgdt	saved_gdt
 	lidt	saved_idt
 	lldt	saved_ldt
 	ljmp	$(__KERNEL_CS), $1f
@@ -44,7 +43,6 @@ bogus_magic:
 
 
 save_registers:
-	sgdt	saved_gdt
 	sidt	saved_idt
 	sldt	saved_ldt
 	str	saved_tss
@@ -93,7 +91,6 @@ ENTRY(saved_magic)	.long	0
 ENTRY(saved_eip)	.long	0
 
 # saved registers
-saved_gdt:	.long	0,0
 saved_idt:	.long	0,0
 saved_ldt:	.long	0
 saved_tss:	.long	0
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 6bd9423..82c39c5 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -61,7 +61,6 @@ static void __save_processor_state(struct saved_context *ctxt)
 	 * descriptor tables
 	 */
 #ifdef CONFIG_X86_32
-	store_gdt(&ctxt->gdt);
 	store_idt(&ctxt->idt);
 #else
 /* CONFIG_X86_64 */
@@ -181,7 +180,6 @@ static void __restore_processor_state(struct saved_context *ctxt)
 	 * ltr is done i fix_processor_context().
 	 */
 #ifdef CONFIG_X86_32
-	load_gdt(&ctxt->gdt);
 	load_idt(&ctxt->idt);
 #else
 /* CONFIG_X86_64 */

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:x86/paravirt] x86, xen, gdt: Remove the pvops variant of store_gdt.
  2013-04-05 20:42 ` [PATCH 3/4] x86/xen/store_gdt: Remove the pvops variant of store_gdt Konrad Rzeszutek Wilk
@ 2013-04-11 23:01   ` tip-bot for Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Konrad Rzeszutek Wilk @ 2013-04-11 23:01 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, konrad.wilk, tglx, hpa

Commit-ID:  357d122670937c35b33d99c46356ef2b63182a1f
Gitweb:     http://git.kernel.org/tip/357d122670937c35b33d99c46356ef2b63182a1f
Author:     Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
AuthorDate: Fri, 5 Apr 2013 16:42:23 -0400
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Thu, 11 Apr 2013 15:40:38 -0700

x86, xen, gdt: Remove the pvops variant of store_gdt.

The two use-cases where we needed to store the GDT were during ACPI S3 suspend
and resume. As the patches:
 x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is not needed
 x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed.

have demonstrated - there are other mechanism by which the GDT is
saved and reloaded during early resume path.

Hence we do not need to worry about the pvops call-chain for saving the
GDT and can and can eliminate it. The other areas where the store_gdt is
used are never going to be hit when running under the pvops platforms.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1365194544-14648-4-git-send-email-konrad.wilk@oracle.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/paravirt.h       | 4 ----
 arch/x86/include/asm/paravirt_types.h | 2 +-
 arch/x86/kernel/acpi/sleep.c          | 2 +-
 arch/x86/kernel/doublefault_32.c      | 2 +-
 arch/x86/kernel/paravirt.c            | 1 -
 arch/x86/kvm/vmx.c                    | 2 +-
 arch/x86/xen/enlighten.c              | 1 -
 7 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 5edd174..7365128 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -262,10 +262,6 @@ static inline void set_ldt(const void *addr, unsigned entries)
 {
 	PVOP_VCALL2(pv_cpu_ops.set_ldt, addr, entries);
 }
-static inline void store_gdt(struct desc_ptr *dtr)
-{
-	PVOP_VCALL1(pv_cpu_ops.store_gdt, dtr);
-}
 static inline void store_idt(struct desc_ptr *dtr)
 {
 	PVOP_VCALL1(pv_cpu_ops.store_idt, dtr);
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 142236e..b6c69a6 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -122,7 +122,7 @@ struct pv_cpu_ops {
 	void (*load_tr_desc)(void);
 	void (*load_gdt)(const struct desc_ptr *);
 	void (*load_idt)(const struct desc_ptr *);
-	void (*store_gdt)(struct desc_ptr *);
+	/* store_gdt has been removed. */
 	void (*store_idt)(struct desc_ptr *);
 	void (*set_ldt)(const void *desc, unsigned entries);
 	unsigned long (*store_tr)(void);
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index 0532f5d..b44577b 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -46,7 +46,7 @@ int acpi_suspend_lowlevel(void)
 	header->pmode_behavior = 0;
 
 #ifndef CONFIG_64BIT
-	store_gdt((struct desc_ptr *)&header->pmode_gdt);
+	native_store_gdt((struct desc_ptr *)&header->pmode_gdt);
 
 	if (!rdmsr_safe(MSR_EFER,
 			&header->pmode_efer_low,
diff --git a/arch/x86/kernel/doublefault_32.c b/arch/x86/kernel/doublefault_32.c
index 37250fe..155a13f 100644
--- a/arch/x86/kernel/doublefault_32.c
+++ b/arch/x86/kernel/doublefault_32.c
@@ -20,7 +20,7 @@ static void doublefault_fn(void)
 	struct desc_ptr gdt_desc = {0, 0};
 	unsigned long gdt, tss;
 
-	store_gdt(&gdt_desc);
+	native_store_gdt(&gdt_desc);
 	gdt = gdt_desc.address;
 
 	printk(KERN_EMERG "PANIC: double fault, gdt at %08lx [%d bytes]\n", gdt, gdt_desc.size);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 17fff18..4ae3d23 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -360,7 +360,6 @@ struct pv_cpu_ops pv_cpu_ops = {
 	.set_ldt = native_set_ldt,
 	.load_gdt = native_load_gdt,
 	.load_idt = native_load_idt,
-	.store_gdt = native_store_gdt,
 	.store_idt = native_store_idt,
 	.store_tr = native_store_tr,
 	.load_tls = native_load_tls,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6667042..867b810 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2459,7 +2459,7 @@ static int hardware_enable(void *garbage)
 		ept_sync_global();
 	}
 
-	store_gdt(&__get_cpu_var(host_gdt));
+	native_store_gdt(&__get_cpu_var(host_gdt));
 
 	return 0;
 }
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c8e1c7b..ddab539 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1220,7 +1220,6 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = {
 	.alloc_ldt = xen_alloc_ldt,
 	.free_ldt = xen_free_ldt,
 
-	.store_gdt = native_store_gdt,
 	.store_idt = native_store_idt,
 	.store_tr = xen_store_tr,
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:x86/paravirt] x86, wakeup, sleep: Use pvops functions for changing GDT entries
  2013-04-05 20:42 ` [PATCH 4/4] x86/wakeup/sleep: Use pvops functions for changing GDT entries Konrad Rzeszutek Wilk
@ 2013-04-11 23:02   ` tip-bot for konrad@kernel.org
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for konrad@kernel.org @ 2013-04-11 23:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, konrad, hpa, mingo, konrad.wilk, tglx, rjw, hpa

Commit-ID:  4d681be3c33dd74efffbe2a8f70634f7128602ec
Gitweb:     http://git.kernel.org/tip/4d681be3c33dd74efffbe2a8f70634f7128602ec
Author:     konrad@kernel.org <konrad@kernel.org>
AuthorDate: Fri, 5 Apr 2013 16:42:24 -0400
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Thu, 11 Apr 2013 15:41:15 -0700

x86, wakeup, sleep: Use pvops functions for changing GDT entries

We check the TSS descriptor before we try to dereference it.
Also we document what the value '9' actually means using the
AMD64 Architecture Programmer's Manual Volume 2, pg 90:
"Hex value 9: Available 64-bit TSS" and pg 91:
"The available 32-bit TSS (09h), which is redefined as the
available 64-bit TSS."

Without this, on Xen, where the GDT is available as R/O (to
protect the hypervisor from the guest modifying it), we end up
with a pagetable fault.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1365194544-14648-5-git-send-email-konrad.wilk@oracle.com
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/power/cpu.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 82c39c5..168da84 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -132,7 +132,10 @@ static void fix_processor_context(void)
 {
 	int cpu = smp_processor_id();
 	struct tss_struct *t = &per_cpu(init_tss, cpu);
-
+#ifdef CONFIG_X86_64
+	struct desc_struct *desc = get_cpu_gdt_table(cpu);
+	tss_desc tss;
+#endif
 	set_tss_desc(cpu, t);	/*
 				 * This just modifies memory; should not be
 				 * necessary. But... This is necessary, because
@@ -141,7 +144,9 @@ static void fix_processor_context(void)
 				 */
 
 #ifdef CONFIG_X86_64
-	get_cpu_gdt_table(cpu)[GDT_ENTRY_TSS].type = 9;
+	memcpy(&tss, &desc[GDT_ENTRY_TSS], sizeof(tss_desc));
+	tss.type = 0x9; /* The available 64-bit TSS (see AMD vol 2, pg 91 */
+	write_gdt_entry(desc, GDT_ENTRY_TSS, &tss, DESC_TSS);
 
 	syscall_init();				/* This sets MSR_*STAR and related */
 #endif

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [tip:x86/paravirt] x86-64, gdt: Store/ load GDT for ACPI S3 or hibernate/resume path is not needed.
  2013-04-11 22:59   ` [tip:x86/paravirt] x86-64, gdt: Store/ load " tip-bot for Konrad Rzeszutek Wilk
@ 2013-04-12 12:02     ` Rafael J. Wysocki
  2013-04-30 21:25       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2013-04-12 12:02 UTC (permalink / raw)
  To: konrad.wilk; +Cc: mingo, hpa, linux-kernel, tglx, hpa, linux-tip-commits

On Thursday, April 11, 2013 03:59:11 PM tip-bot for Konrad Rzeszutek Wilk wrote:
> Commit-ID:  e7a5cd063c7b4c58417f674821d63f5eb6747e37
> Gitweb:     http://git.kernel.org/tip/e7a5cd063c7b4c58417f674821d63f5eb6747e37
> Author:     Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> AuthorDate: Fri, 5 Apr 2013 16:42:21 -0400
> Committer:  H. Peter Anvin <hpa@linux.intel.com>
> CommitDate: Thu, 11 Apr 2013 15:39:38 -0700
> 
> x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed.
> 
> During the ACPI S3 resume path the trampoline code handles it already.
> 
> During the ACPI S3 suspend phase (acpi_suspend_lowlevel) we set:
> early_gdt_descr.address = (..)get_cpu_gdt_table(smp_processor_id());
> 
> which is then used during the resume path and has the same exact
> value as what the store/load_gdt do with the saved_context
> (which is saved/restored via save/restore_processor_state()).
> 
> The flow during resume is complex and for 64-bit kernels we use three GDTs
> - one early bootstrap GDT (wakeup_igdt) that we load to workaround
> broken BIOSes, an early Protected Mode to Long Mode transition one
> (tr_gdt), and the final one - early_gdt_descr (which points to the real GDT).
> 
> The early ('wakeup_gdt') is loaded in 'trampoline_start' for working
> around broken BIOSes, and then when we end up in Protected Mode in the
> startup_32 (in trampoline_64.s, not head_32.s) we use the 'tr_gdt'
> (still in trampoline_64.s). This 'tr_gdt' has a a 32-bit code segment,
> 64-bit code segment with L=1, and a 32-bit data segment.
> 
> Once we have transitioned from Protected Mode to Long Mode we then
> set the GDT to 'early_gdt_desc' and then via an iretq emerge in
> wakeup_long64 (set via 'initial_code' variable in acpi_suspend_lowlevel).
> 
> In the wakeup_long64 we end up restoring the %rip (which is set to
> 'resume_point') and jump there.
> 
> In 'resume_point' we call 'restore_processor_state' which does
> the load_gdt on the saved context. This load_gdt is redundant as the
> GDT loaded via early_gdt_desc is the same.
> 
> Here is the call-chain:
>  wakeup_start
>    |- lgdtl wakeup_gdt [the work-around broken BIOSes]
>    |
>    \-- trampoline_start (trampoline_64.S)
>          |- lgdtl tr_gdt
>          |
>          \-- startup_32 (trampoline_64.S)
>                |
>                \-- startup_64 (trampoline_64.S)
>                       |
>                       \-- secondary_startup_64
>                                |- lgdtl early_gdt_desc
>                                | ...
>                                |- movq initial_code(%rip), %eax
>                                |-.. lretq
>                                \-- wakeup_64
>                                      |-- other registers are reloaded
>                                      |-- call restore_processor_state
> 
> The hibernate path is much simpler. During the saving of the hibernation
> image we call save_processor_state() and save the contents of that along
> with the rest of the kernel in the hibernation image destination.
> We save the EIP of 'restore_registers' (restore_jump_address) and cr3
> (restore_cr3).
> 
> During hibernate resume, the 'restore_registers' (via the
> 'restore_jump_address) in hibernate_asm_64.S is invoked which restores
> the contents of most registers. Naturally the resume path benefits from
> already being in 64-bit mode, so it does not have to load the GDT.
> 
> It only reloads the cr3 (from restore_cr3) and continues on. Note that
> the restoration of the restore image page-tables is done prior to this.
> 
> After the 'restore_registers' it returns and we end up called
> restore_processor_state() - where we reload the GDT. The reload of
> the GDT is not needed as bootup kernel has already loaded the GDT which
> is at the same physical location as the the restored kernel.

I'm not sure if this particular statement is actually correct.  It is correct
on 32-bit, but here it is not necessary for the bootup kernel to be the same
as the image one.  Different kernel version may be used for that even (at
least theoretically).  So the question is, and I'm quite unsure about the
answer, if the GDT of from the bootup kernel is really *guaranteed* to be
at the same location (given that those kernels may be really different).

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [tip:x86/paravirt] x86-64, gdt: Store/ load GDT for ACPI S3 or hibernate/resume path is not needed.
  2013-04-12 12:02     ` Rafael J. Wysocki
@ 2013-04-30 21:25       ` Konrad Rzeszutek Wilk
  2013-04-30 22:38         ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-04-30 21:25 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: mingo, hpa, linux-kernel, tglx, hpa, linux-tip-commits

> > After the 'restore_registers' it returns and we end up called
> > restore_processor_state() - where we reload the GDT. The reload of
> > the GDT is not needed as bootup kernel has already loaded the GDT which
> > is at the same physical location as the the restored kernel.
> 
> I'm not sure if this particular statement is actually correct.  It is correct
> on 32-bit, but here it is not necessary for the bootup kernel to be the same
> as the image one.  Different kernel version may be used for that even (at
> least theoretically).  So the question is, and I'm quite unsure about the
> answer, if the GDT of from the bootup kernel is really *guaranteed* to be
> at the same location (given that those kernels may be really different).

A bit of testing with different bootup kernel provided me with this error:
PM: Image mismatch: version

which after a bit of digging pointed me to 'check_image_kernel'. The criteria
there imply that the different kernel versions or releases cannot be used with
hibernation.

For "fun" I subverted the checks and tried resuming a differently compiled
kernel (CONFIG_KALLSYMS_ALL=y) vs the one that was booting
(# CONFIG_KALLSYMS_ALL is not set) and found out that it does not work on a vanilla
v3.9 kernel.

diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 0de2857..8589e05 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1633,10 +1633,16 @@ static char *check_image_kernel(struct swsusp_info *info)
 		return "kernel version";
 	if (strcmp(info->uts.sysname,init_utsname()->sysname))
 		return "system type";
-	if (strcmp(info->uts.release,init_utsname()->release))
-		return "kernel release";
-	if (strcmp(info->uts.version,init_utsname()->version))
-		return "version";
+	if (strcmp(info->uts.release,init_utsname()->release)) {
+		printk(KERN_ERR "%s != %s .. continuing on.\n", info->uts.release,
+			init_utsname()->release);
+		return NULL;
+	}
+	if (strcmp(info->uts.version,init_utsname()->version)) {
+		printk(KERN_ERR "%s != %s .. continuing on.\n", info->uts.version,
+			init_utsname()->version);
+		return NULL;
+	}
 	if (strcmp(info->uts.machine,init_utsname()->machine))
 		return "machine";
 	return NULL;

Meaning without these patches we do crash when trying to load a different kernel.

Here is the log:

# echo "8:1" > /sys/power/resume
[   18.881169] PM: Starting manual resume from disk
[   18.884164] PM: Hibernation image partition 8:1 present
[   18.885418] PM: Looking for hibernation image.
[   18.888344] PM: Image signature found, resuming
[   18.895387] PM: Marking nosave pages: [mem 0x0009f000-0x000fffff]
[   18.896892] PM: Basic memory bitmaps created
[   18.897937] PM: Preparing processes for restore.
[   18.900102] Freezing user space processes ... (elapsed 0.01 seconds) done.
[   18.914402] PM: Loading hibernation image.
[   19.201178] PM: Using 2 thread(s) for decompression.
[   19.201178] PM: Loading and decompressing image data (53823 pages)...
[   19.210490] 3.9.0upstream != 3.9.0upstream-dirty .. continuing on.
[   19.874986] PM: Image loading progress:   0%
[   20.027158] PM: Image loading progress:  10%
[   20.177723] PM: Image loading progress:  20%
[   20.335267] PM: Image loading progress:  30%
[   20.491458] PM: Image loading progress:  40%
[   20.578486] PM: Image loading progress:  50%
[   20.663751] PM: Image loading progress:  60%
[   20.985474] PM: Image loading progress:  70%
[   21.112780] PM: Image loading progress:  80%
[   21.219476] PM: Image loading progress:  90%
[   21.336482] PM: Image loading progress: 100%
[   21.340302] PM: Image loading done.
[   21.342830] PM: Read 215292 kbytes in 2.12 seconds (101.55 MB/s)
[   21.350612] PM: Image successfully loaded
[   21.353913] Suspending console(s) (use no_console_suspend to debug)

.. and the machine either reboots or just hangs

So I think the assumptions I made are OK?

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [tip:x86/paravirt] x86-64, gdt: Store/ load GDT for ACPI S3 or hibernate/resume path is not needed.
  2013-04-30 21:25       ` Konrad Rzeszutek Wilk
@ 2013-04-30 22:38         ` Rafael J. Wysocki
  2013-05-01  0:48           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2013-04-30 22:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: mingo, hpa, linux-kernel, tglx, hpa, linux-tip-commits

On Tuesday, April 30, 2013 05:25:42 PM Konrad Rzeszutek Wilk wrote:
> > > After the 'restore_registers' it returns and we end up called
> > > restore_processor_state() - where we reload the GDT. The reload of
> > > the GDT is not needed as bootup kernel has already loaded the GDT which
> > > is at the same physical location as the the restored kernel.
> > 
> > I'm not sure if this particular statement is actually correct.  It is correct
> > on 32-bit, but here it is not necessary for the bootup kernel to be the same
> > as the image one.  Different kernel version may be used for that even (at
> > least theoretically).  So the question is, and I'm quite unsure about the
> > answer, if the GDT of from the bootup kernel is really *guaranteed* to be
> > at the same location (given that those kernels may be really different).
> 
> A bit of testing with different bootup kernel provided me with this error:
> PM: Image mismatch: version

This implies that you tested on an architechture without
CONFIG_ARCH_HIBERNATION_HEADER.

> which after a bit of digging pointed me to 'check_image_kernel'. The criteria
> there imply that the different kernel versions or releases cannot be used with
> hibernation.

Only if CONFIG_ARCH_HIBERNATION_HEADER is unset, which is not the case for
CONFIG_X86_64.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [tip:x86/paravirt] x86-64, gdt: Store/ load GDT for ACPI S3 or hibernate/resume path is not needed.
  2013-04-30 22:38         ` Rafael J. Wysocki
@ 2013-05-01  0:48           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-05-01  0:48 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: mingo, hpa, linux-kernel, tglx, hpa, linux-tip-commits

On Wed, May 01, 2013 at 12:38:17AM +0200, Rafael J. Wysocki wrote:
> On Tuesday, April 30, 2013 05:25:42 PM Konrad Rzeszutek Wilk wrote:
> > > > After the 'restore_registers' it returns and we end up called
> > > > restore_processor_state() - where we reload the GDT. The reload of
> > > > the GDT is not needed as bootup kernel has already loaded the GDT which
> > > > is at the same physical location as the the restored kernel.
> > > 
> > > I'm not sure if this particular statement is actually correct.  It is correct
> > > on 32-bit, but here it is not necessary for the bootup kernel to be the same
> > > as the image one.  Different kernel version may be used for that even (at
> > > least theoretically).  So the question is, and I'm quite unsure about the
> > > answer, if the GDT of from the bootup kernel is really *guaranteed* to be
> > > at the same location (given that those kernels may be really different).
> > 
> > A bit of testing with different bootup kernel provided me with this error:
> > PM: Image mismatch: version
> 
> This implies that you tested on an architechture without
> CONFIG_ARCH_HIBERNATION_HEADER.
> 
> > which after a bit of digging pointed me to 'check_image_kernel'. The criteria
> > there imply that the different kernel versions or releases cannot be used with
> > hibernation.
> 
> Only if CONFIG_ARCH_HIBERNATION_HEADER is unset, which is not the case for
> CONFIG_X86_64.

OK, Let me play on 64-bit later on this week and see if the patch I cobbled up
to save/restore the GDT in the booting kernel properly works.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-05-01  0:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-05 20:42 [RFC PATCH] axe the store_gdt() pvops call. (v1) Konrad Rzeszutek Wilk
2013-04-05 20:42 ` [PATCH 1/4] x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed Konrad Rzeszutek Wilk
2013-04-11 22:59   ` [tip:x86/paravirt] x86-64, gdt: Store/ load " tip-bot for Konrad Rzeszutek Wilk
2013-04-12 12:02     ` Rafael J. Wysocki
2013-04-30 21:25       ` Konrad Rzeszutek Wilk
2013-04-30 22:38         ` Rafael J. Wysocki
2013-05-01  0:48           ` Konrad Rzeszutek Wilk
2013-04-05 20:42 ` [PATCH 2/4] x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume " Konrad Rzeszutek Wilk
2013-04-11 23:00   ` [tip:x86/paravirt] x86-32, gdt: Store/ load " tip-bot for Konrad Rzeszutek Wilk
2013-04-05 20:42 ` [PATCH 3/4] x86/xen/store_gdt: Remove the pvops variant of store_gdt Konrad Rzeszutek Wilk
2013-04-11 23:01   ` [tip:x86/paravirt] x86, xen, gdt: " tip-bot for Konrad Rzeszutek Wilk
2013-04-05 20:42 ` [PATCH 4/4] x86/wakeup/sleep: Use pvops functions for changing GDT entries Konrad Rzeszutek Wilk
2013-04-11 23:02   ` [tip:x86/paravirt] x86, wakeup, sleep: " tip-bot for konrad@kernel.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox