* [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
@ 2013-12-05 16:12 Laszlo Ersek
2013-12-05 16:50 ` Laszlo Ersek
2013-12-05 17:42 ` Paolo Bonzini
0 siblings, 2 replies; 16+ messages in thread
From: Laszlo Ersek @ 2013-12-05 16:12 UTC (permalink / raw)
To: KVM devel mailing list; +Cc: edk2-devel@lists.sourceforge.net
Hi,
I'm working on S3 suspend/resume in OVMF. The problem is that I'm getting an
unexpected guest reboot for code (LRET) that works on physical hardware. I
tried to trace the problem with ftrace, but I didn't get any mentions of
em_ret_far(). (Maybe I was looking in the wrong place.)
Please find the the assembly-language "trampoline" that is invoked (in 64-bit
mode) with the 16-bit real mode resume vector placed in "rcx" (EFIAPI calling
convention). The excerpt is from the edk2 tree,
"MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S".
I'm annotating the source code to the right -- please excuse my audacity as I
know you all eat assembly for breakfast, but maybe it will speed up your
processing. (Or perhaps I'll sneakily confuse you with my errors :))
ASM_GLOBAL ASM_PFX(AsmTransferControl) #
ASM_PFX(AsmTransferControl): #
# rcx S3WakingVector :DWORD # ecx: ........ ....PPPP QQQQQQQQ RRRRSSSS
# rdx AcpiLowMemoryBase :DWORD #
lea _AsmTransferControl_al_0000(%rip), %eax # pushing $0x28 for CS
movq $0x2800000000, %r8 # and address of
orq %r8, %rax # _AsmTransferControl_al_0000
pushq %rax # for RIP
shrd $20, %ecx, %ebx # ebx: PPPPQQQQ QQQQRRRR SSSS.... ........
andl $0x0f, %ecx # ecx: 00000000 00000000 00000000 0000SSSS
movw %cx, %bx # ebx: PPPPQQQQ QQQQRRRR 00000000 0000SSSS
movl %ebx, jmp_addr(%rip) # stores vector as 16-bit segment:offset pair
xxxx: # -- my own loop
jmp xxxx # -- for debugging
lret # (*) TRIGGERS REBOOT
_AsmTransferControl_al_0000: #
.byte 0x0b8, 0x30, 0 # mov ax, 30h as selector #
movl %eax, %ds #
movl %eax, %es #
movl %eax, %fs #
movl %eax, %gs #
movl %eax, %ss #
movq %cr0, %rax #
movq %cr4, %rbx #
.byte 0x66 # (**)
andl $0x7ffffffe, %eax # preps for turning off Paging and Protection Enable
andb $0xdf, %bl # preps for turning off PAE
movq %rax, %cr0 # Paging and PE off
.byte 0x66 # (**)
movl $0x0c0000080, %ecx #
rdmsr #
andb $0xfe, %ah #
wrmsr # IA-32e Mode Enable off
movq %rbx, %cr4 # PAE off
.byte 0x0ea # jmp far jmp_addr #
jmp_addr: #
.long 0 # PPPPQQQQ QQQQRRRR:SSSS
The small loop at xxxx is my debug loop. The "lret" instruction right after
(marked with (*)) triggers a reboot in KVM.
In the loop, this is the register dump (taken with the "info registers" qemu
monitor command):
RAX=000000289c75be2b RBX=000000009a1d0000 RCX=0000000000000000 RDX=0000000000000000
RSI=0000000000000000 RDI=0000000000000000 RBP=000000009f7bafd0 RSP=000000009f7bae30
R8 =0000002800000000 R9 =0000000000000000 R10=00000000008454cd R11=0000000000000000
R12=0000000000000000 R13=0000000000000000 R14=00000000008454c6 R15=0000000000000000
RIP=000000009c75be28 RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
CS =0018 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
DS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT= 0000000000844c80 00000047
IDT= 000000009c01fd60 0000021f
CR0=80000033 CR2=0000000000000000 CR3=0000000000080000 CR4=00000660
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000500
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
XMM08=00000000000000000000000000000000 XMM09=00000000000000000000000000000000
XMM10=00000000000000000000000000000000 XMM11=00000000000000000000000000000000
XMM12=00000000000000000000000000000000 XMM13=00000000000000000000000000000000
XMM14=00000000000000000000000000000000 XMM15=00000000000000000000000000000000
Right before the function call in the C source, I also read the CS register,
the GDTR, and the GDT entries (please excuse the long lines):
S3ResumeBootOs: CS=0x0018
S3ResumeBootOs: Desc.Limit=0x0047
0x0000: 0x0000000000000000: Base=0x00000000 Limit=0x00000 Type=0x0 (D RO ) S=0x0 (system ) DPL=0x0 Present=0 Avail=0 64-bitC=0 D/B=0 LimitGran=0x0 (1B )
0x0008: 0x0000000000000000: Base=0x00000000 Limit=0x00000 Type=0x0 (D RO ) S=0x0 (system ) DPL=0x0 Present=0 Avail=0 64-bitC=0 D/B=0 LimitGran=0x0 (1B )
0x0010: 0x00CF9B000000FFFF: Base=0x00000000 Limit=0xFFFFF Type=0xB (C ER A ) S=0x1 (code/data) DPL=0x0 Present=1 Avail=0 64-bitC=0 D/B=1 LimitGran=0x1 (4KB)
0x0018: 0x00CF93000000FFFF: Base=0x00000000 Limit=0xFFFFF Type=0x3 (D RW A ) S=0x1 (code/data) DPL=0x0 Present=1 Avail=0 64-bitC=0 D/B=1 LimitGran=0x1 (4KB)
0x0020: 0x0000000000000000: Base=0x00000000 Limit=0x00000 Type=0x0 (D RO ) S=0x0 (system ) DPL=0x0 Present=0 Avail=0 64-bitC=0 D/B=0 LimitGran=0x0 (1B )
0x0028: 0x008F9B000000FFFF: Base=0x00000000 Limit=0xFFFFF Type=0xB (C ER A ) S=0x1 (code/data) DPL=0x0 Present=1 Avail=0 64-bitC=0 D/B=0 LimitGran=0x1 (4KB)
0x0030: 0x008F93000000FFFF: Base=0x00000000 Limit=0xFFFFF Type=0x3 (D RW A ) S=0x1 (code/data) DPL=0x0 Present=1 Avail=0 64-bitC=0 D/B=0 LimitGran=0x1 (4KB)
0x0038: 0x00AF9B000000FFFF: Base=0x00000000 Limit=0xFFFFF Type=0xB (C ER A ) S=0x1 (code/data) DPL=0x0 Present=1 Avail=0 64-bitC=1 D/B=0 LimitGran=0x1 (4KB)
0x0040: 0x0000000000000000: Base=0x00000000 Limit=0x00000 Type=0x0 (D RO ) S=0x0 (system ) DPL=0x0 Present=0 Avail=0 64-bitC=0 D/B=0 LimitGran=0x0 (1B )
The purpose of the LRET would be (by way of selecting CS=0x0028) to select
compat mode code execution (64-bitC=0), and to turn off the D/B bit, ie. set
default address & operand size to 16 bits. (This is the justification for the
0x66 prefixes I marked with (**) in the assembly.)
Interesting things:
- CS is currently 0x18, which describes a data segment. Strange (but works).
- If I select 0x38 or 0x10 as CS, then the LRET works fine (as in, I reach the
target label.)
- The offending segment descriptor (at 0x28) differs from these other working
segment descriptors in the following small details:
- shared properties:
Base=0x00000000
Limit=0xFFFFF
Type=0xB (C ER A )
S=0x1 (code/data)
DPL=0x0
Present=1
Avail=0
LimitGran=0x1 (4KB)
- different properties:
0x0010: 0x00CF9B000000FFFF: 64-bitC=0 D/B=1 works
0x0028: 0x008F9B000000FFFF: 64-bitC=0 D/B=0 reboots
0x0038: 0x00AF9B000000FFFF: 64-bitC=1 D/B=0 works
That is:
- if I let 64-bit mode execution enabled (64-bitC=1, desc 0x38), the lret
works.
- If I switch to compat mode execution (64-bitC=0, desc 0x10), *and* keep the
default addr/op size 32 bits, the lret still works.
- If I switch to compat mode execution (64-bitC=0, desc 0x28), but also change
the default addr/op size to 16-bits, then the lret reboots the guest in KVM
(but works on physical hardware).
Host:
- Intel(R) Core(TM) i7 CPU M 620 @ 2.67GHz
- KVM parameters (all left at default):
emulate_invalid_guest_state: Y
enable_apicv: N
enable_shadow_vmcs: N
ept: Y
eptad: N
fasteoi: Y
flexpriority: Y
nested: N
ple_gap: 0
ple_window: 4096
unrestricted_guest: Y
vmm_exclusive: Y
vpid: Y
- KVM: 3.11
- qemu: at 7dc65c02 ("Open 2.0 development tree")
- guest RAM size: 2560 MB (0xA0000000 bytes)
I'm also pasting an objdump disassembly of the routine below (compiled without
my small debug loop). The disassembly is kind of garbled (eg. the movabs and
the 32-bit code), but the hexdump might be helpful.
Please keep me CC'd, I'm not subscribed.
Thank you!
Laszlo
0000000000000000 <AsmTransferControl>:
0: 8d 05 1f 00 00 00 lea 0x1f(%rip),%eax # 25 <_AsmTransferControl_al_0000>
6: 49 b8 00 00 00 00 28 movabs $0x2800000000,%r8
d: 00 00 00
10: 4c 09 c0 or %r8,%rax
13: 50 push %rax
14: 0f ac cb 14 shrd $0x14,%ecx,%ebx
18: 83 e1 0f and $0xf,%ecx
1b: 66 89 cb mov %cx,%bx
1e: 89 1d 31 00 00 00 mov %ebx,0x31(%rip) # 55 <jmp_addr>
24: cb lret
0000000000000025 <_AsmTransferControl_al_0000>:
25: b8 30 00 8e d8 mov $0xd88e0030,%eax
2a: 8e c0 mov %eax,%es
2c: 8e e0 mov %eax,%fs
2e: 8e e8 mov %eax,%gs
30: 8e d0 mov %eax,%ss
32: 0f 20 c0 mov %cr0,%rax
35: 0f 20 e3 mov %cr4,%rbx
38: 66 25 fe ff and $0xfffe,%ax
3c: ff (bad)
3d: 7f 80 jg ffffffffffffffbf <L1+0xfffffffffffffeb8>
3f: e3 df jrcxz 20 <AsmTransferControl+0x20>
41: 0f 22 c0 mov %rax,%cr0
44: 66 b9 80 00 mov $0x80,%cx
48: 00 c0 add %al,%al
4a: 0f 32 rdmsr
4c: 80 e4 fe and $0xfe,%ah
4f: 0f 30 wrmsr
51: 0f 22 e3 mov %rbx,%cr4
54: ea (bad)
0000000000000055 <jmp_addr>:
55: 00 00 add %al,(%rax)
...
------------------------------------------------------------------------------
Sponsored by Intel(R) XDK
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
2013-12-05 16:12 [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline Laszlo Ersek
@ 2013-12-05 16:50 ` Laszlo Ersek
2013-12-05 17:42 ` Paolo Bonzini
1 sibling, 0 replies; 16+ messages in thread
From: Laszlo Ersek @ 2013-12-05 16:50 UTC (permalink / raw)
To: KVM devel mailing list; +Cc: edk2-devel
Small addition -- apologies for the self-followup:
On 12/05/13 17:12, Laszlo Ersek wrote:
> I tried to trace the problem with ftrace, but I didn't get any mentions of
> em_ret_far(). (Maybe I was looking in the wrong place.)
I applied the following small patch (to the original code):
diff --git a/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S b/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S
index e59fd04..daa4f7e 100644
--- a/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S
+++ b/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S
@@ -18,8 +18,8 @@ ASM_GLOBAL ASM_PFX(AsmTransferControl)
ASM_PFX(AsmTransferControl):
# rcx S3WakingVector :DWORD
# rdx AcpiLowMemoryBase :DWORD
- lea _AsmTransferControl_al_0000(%rip), %eax
- movq $0x2800000000, %r8
+ lea AsmTransferControl(%rip), %eax
+ movq $0x3800000000, %r8
orq %r8, %rax
pushq %rax
shrd $20, %ecx, %ebx
This turns the code right under AsmTransferControl into a working, 64-bit mode
loop. (Recall that 0x38 selects a descriptor that has the L ("64-bitC") bit
set:
> 0x0038: 0x00AF9B000000FFFF: Base=0x00000000 Limit=0xFFFFF Type=0xB (C ER A ) S=0x1 (code/data) DPL=0x0 Present=1 Avail=0 64-bitC=1 D/B=0 LimitGran=0x1 (4KB)
)
While this was spinning (I checked the RIP several times with the qemu monitor
and it was alternating between a few close values -- ie. not stuck), I ran
trace-cmd. The report seems to confirm that the lret is not emulated, because
the only lines I'm seeing are:
qemu-system-x86-3901 [001] 38939.599663: kvm_exit: reason EXTERNAL_INTERRUPT rip 0x9c75be0a info 0 800000ef
qemu-system-x86-3901 [001] 38939.599684: kvm_entry: vcpu 0
repeated infinitely. The rip varies between a few close values,
458 rip 0x9c75be04
313 rip 0x9c75be0a
5 rip 0x9c75be17
4 rip 0x9c75be18
3 rip 0x9c75be22
8 rip 0x9c75be28
Thanks again and sorry for the noise.
Laszlo
------------------------------------------------------------------------------
Sponsored by Intel(R) XDK
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
2013-12-05 16:12 [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline Laszlo Ersek
2013-12-05 16:50 ` Laszlo Ersek
@ 2013-12-05 17:42 ` Paolo Bonzini
2013-12-05 18:29 ` Laszlo Ersek
2013-12-05 22:38 ` Laszlo Ersek
1 sibling, 2 replies; 16+ messages in thread
From: Paolo Bonzini @ 2013-12-05 17:42 UTC (permalink / raw)
To: edk2-devel; +Cc: KVM devel mailing list
Il 05/12/2013 17:12, Laszlo Ersek ha scritto:
> Hi,
>
> I'm working on S3 suspend/resume in OVMF. The problem is that I'm getting an
> unexpected guest reboot for code (LRET) that works on physical hardware. I
> tried to trace the problem with ftrace, but I didn't get any mentions of
> em_ret_far(). (Maybe I was looking in the wrong place.)
What does ftrace say anyway?
> Please find the the assembly-language "trampoline" that is invoked (in 64-bit
> mode) with the 16-bit real mode resume vector placed in "rcx" (EFIAPI calling
> convention). The excerpt is from the edk2 tree,
> "MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S".
Can you send me a pointer to a git tree or, even better, an OVMF.fd file + instructions
on how to trigger the problem?
> - shared properties:
> Base=0x00000000
> Limit=0xFFFFF
> Type=0xB (C ER A )
> S=0x1 (code/data)
> DPL=0x0
> Present=1
> Avail=0
> LimitGran=0x1 (4KB)
>
> - different properties:
> 0x0010: 0x00CF9B000000FFFF: 64-bitC=0 D/B=1 works
> 0x0028: 0x008F9B000000FFFF: 64-bitC=0 D/B=0 reboots
> 0x0038: 0x00AF9B000000FFFF: 64-bitC=1 D/B=0 works
>
> That is:
> - if I let 64-bit mode execution enabled (64-bitC=1, desc 0x38), the lret
> works.
> - If I switch to compat mode execution (64-bitC=0, desc 0x10), *and* keep the
> default addr/op size 32 bits, the lret still works.
Perhaps you could try switching to 32-bit mode first, then disable paging,
then jump to 16-bit mode. Like this (untested):
diff --git a/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S b/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S
index e59fd04..d1cac9d 100644
--- a/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S
+++ b/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S
@@ -19,7 +19,7 @@ ASM_PFX(AsmTransferControl):
# rcx S3WakingVector :DWORD
# rdx AcpiLowMemoryBase :DWORD
lea _AsmTransferControl_al_0000(%rip), %eax
- movq $0x2800000000, %r8
+ movq $0x1000000000, %r8
orq %r8, %rax
pushq %rax
shrd $20, %ecx, %ebx
@@ -28,24 +28,32 @@ ASM_PFX(AsmTransferControl):
movl %ebx, jmp_addr(%rip)
lret
_AsmTransferControl_al_0000:
+ # Old SS should still be okay?
+ addl _AsmTransferControl_al_0001-_AsmTransferControl_al_0000, %eax
+ pushl $0x28
+ pushl %eax
+ movq %cr0, %rax
+ movq %cr4, %rbx
+ andl $0x7fffffff, %eax
+ andb $0xdf, %bl
+ movq %rax, %cr0 # sets EFER.LMA=0 too, so says Intel
+ movl $0x0c0000080, %ecx
+ rdmsr
+ andb $0xfe, %ah # set EFER.LME=0
+ wrmsr
+ movq %rbx, %cr4 # only now set CR4.PAE=0
+ lret
+_AsmTransferControl_al_0001:
.byte 0x0b8, 0x30, 0 # mov ax, 30h as selector
movl %eax, %ds
movl %eax, %es
movl %eax, %fs
movl %eax, %gs
movl %eax, %ss
- movq %cr0, %rax
- movq %cr4, %rbx
- .byte 0x66
- andl $0x7ffffffe, %eax
- andb $0xdf, %bl
- movq %rax, %cr0
- .byte 0x66
- movl $0x0c0000080, %ecx
- rdmsr
- andb $0xfe, %ah
- wrmsr
- movq %rbx, %cr4
+ movl %cr0, %rax # Get control register 0
+ .byte 0x66
+ .byte 0x83,0xe0,0xfe # and eax, 0fffffffeh ; Clear PE bit (bit #0)
+ .byte 0xf,0x22,0xc0 # mov cr0, eax ; Activate real mode
> - If I switch to compat mode execution (64-bitC=0, desc 0x28), but also change
> the default addr/op size to 16-bits, then the lret reboots the guest in KVM
> (but works on physical hardware).
Did you try this on physical hardware, or just assumed that? :)
Paolo
------------------------------------------------------------------------------
Sponsored by Intel(R) XDK
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
2013-12-05 17:42 ` Paolo Bonzini
@ 2013-12-05 18:29 ` Laszlo Ersek
2013-12-06 12:03 ` Paolo Bonzini
2013-12-05 22:38 ` Laszlo Ersek
1 sibling, 1 reply; 16+ messages in thread
From: Laszlo Ersek @ 2013-12-05 18:29 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: edk2-devel, KVM devel mailing list
On 12/05/13 18:42, Paolo Bonzini wrote:
> Il 05/12/2013 17:12, Laszlo Ersek ha scritto:
>> Hi,
>>
>> I'm working on S3 suspend/resume in OVMF. The problem is that I'm getting an
>> unexpected guest reboot for code (LRET) that works on physical hardware. I
>> tried to trace the problem with ftrace, but I didn't get any mentions of
>> em_ret_far(). (Maybe I was looking in the wrong place.)
>
> What does ftrace say anyway?
(pls. see in the next msg I sent)
>
>> Please find the the assembly-language "trampoline" that is invoked (in 64-bit
>> mode) with the 16-bit real mode resume vector placed in "rcx" (EFIAPI calling
>> convention). The excerpt is from the edk2 tree,
>> "MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S".
>
> Can you send me a pointer to a git tree or, even better, an OVMF.fd file + instructions
> on how to trigger the problem?
http://people.redhat.com/~lersek/ovmf_s3_lret/
I use a stock F19 guest, with the default systemd target set to
multi-user. Then I login as root at the console, and issue "pm-suspend".
Once the guest is suspended, I type
virsh qemu-monitor-command ovmf.f19 --hmp system_wakeup
on the host.
At this point the guest starts spinning (visible eg. in the virt-manager
CPU usage chart), and the OVMF debug log (written to the qemu debug
console) is continuously growing. It always takes the same path: selects
the S3 boot path due to the 0xFE byte at 0xF in CMOS, progresses to the
trampoline, and resets.
>
>> - shared properties:
>> Base=0x00000000
>> Limit=0xFFFFF
>> Type=0xB (C ER A )
>> S=0x1 (code/data)
>> DPL=0x0
>> Present=1
>> Avail=0
>> LimitGran=0x1 (4KB)
>>
>> - different properties:
>> 0x0010: 0x00CF9B000000FFFF: 64-bitC=0 D/B=1 works
>> 0x0028: 0x008F9B000000FFFF: 64-bitC=0 D/B=0 reboots
>> 0x0038: 0x00AF9B000000FFFF: 64-bitC=1 D/B=0 works
>>
>> That is:
>> - if I let 64-bit mode execution enabled (64-bitC=1, desc 0x38), the lret
>> works.
>> - If I switch to compat mode execution (64-bitC=0, desc 0x10), *and* keep the
>> default addr/op size 32 bits, the lret still works.
>
> Perhaps you could try switching to 32-bit mode first, then disable paging,
> then jump to 16-bit mode. Like this (untested):
I had something like this in mind (I even mentioned it on edk2-devel
<http://thread.gmane.org/gmane.comp.bios.tianocore.devel/5297/focus=5331>),
but didn't know how to implement it. There are at least 6 factors in
play here:
- the L bit in the segment descriptor,
- the D/B bit in the segment descriptor,
- Paging,
- Protection Enable,
- IA-32e Mode Enable,
- PAE.
That's (almost) 6! orderings to test :)
So thanks a lot for suggesting a patch, I'll try to play with it.
>
> diff --git a/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S b/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S
> index e59fd04..d1cac9d 100644
> --- a/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S
> +++ b/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S
> @@ -19,7 +19,7 @@ ASM_PFX(AsmTransferControl):
> # rcx S3WakingVector :DWORD
> # rdx AcpiLowMemoryBase :DWORD
> lea _AsmTransferControl_al_0000(%rip), %eax
> - movq $0x2800000000, %r8
> + movq $0x1000000000, %r8
> orq %r8, %rax
> pushq %rax
> shrd $20, %ecx, %ebx
> @@ -28,24 +28,32 @@ ASM_PFX(AsmTransferControl):
> movl %ebx, jmp_addr(%rip)
> lret
> _AsmTransferControl_al_0000:
> + # Old SS should still be okay?
> + addl _AsmTransferControl_al_0001-_AsmTransferControl_al_0000, %eax
> + pushl $0x28
> + pushl %eax
> + movq %cr0, %rax
> + movq %cr4, %rbx
> + andl $0x7fffffff, %eax
> + andb $0xdf, %bl
> + movq %rax, %cr0 # sets EFER.LMA=0 too, so says Intel
> + movl $0x0c0000080, %ecx
> + rdmsr
> + andb $0xfe, %ah # set EFER.LME=0
> + wrmsr
> + movq %rbx, %cr4 # only now set CR4.PAE=0
> + lret
> +_AsmTransferControl_al_0001:
> .byte 0x0b8, 0x30, 0 # mov ax, 30h as selector
> movl %eax, %ds
> movl %eax, %es
> movl %eax, %fs
> movl %eax, %gs
> movl %eax, %ss
> - movq %cr0, %rax
> - movq %cr4, %rbx
> - .byte 0x66
> - andl $0x7ffffffe, %eax
> - andb $0xdf, %bl
> - movq %rax, %cr0
> - .byte 0x66
> - movl $0x0c0000080, %ecx
> - rdmsr
> - andb $0xfe, %ah
> - wrmsr
> - movq %rbx, %cr4
> + movl %cr0, %rax # Get control register 0
> + .byte 0x66
> + .byte 0x83,0xe0,0xfe # and eax, 0fffffffeh ; Clear PE bit (bit #0)
> + .byte 0xf,0x22,0xc0 # mov cr0, eax ; Activate real mode
>
>> - If I switch to compat mode execution (64-bitC=0, desc 0x28), but also change
>> the default addr/op size to 16-bits, then the lret reboots the guest in KVM
>> (but works on physical hardware).
>
> Did you try this on physical hardware, or just assumed that? :)
I was told on edk2-devel O:)
http://thread.gmane.org/gmane.comp.bios.tianocore.devel/5297/focus=5333
Thanks!
Laszlo
------------------------------------------------------------------------------
Sponsored by Intel(R) XDK
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
2013-12-05 17:42 ` Paolo Bonzini
2013-12-05 18:29 ` Laszlo Ersek
@ 2013-12-05 22:38 ` Laszlo Ersek
2013-12-05 22:53 ` Andrew Fish
2013-12-07 16:25 ` David Woodhouse
1 sibling, 2 replies; 16+ messages in thread
From: Laszlo Ersek @ 2013-12-05 22:38 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: edk2-devel, KVM devel mailing list
On 12/05/13 18:42, Paolo Bonzini wrote:
> diff --git a/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S b/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S
> index e59fd04..d1cac9d 100644
> --- a/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S
> +++ b/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S
> @@ -19,7 +19,7 @@ ASM_PFX(AsmTransferControl):
> # rcx S3WakingVector :DWORD
> # rdx AcpiLowMemoryBase :DWORD
> lea _AsmTransferControl_al_0000(%rip), %eax
> - movq $0x2800000000, %r8
> + movq $0x1000000000, %r8
> orq %r8, %rax
> pushq %rax
> shrd $20, %ecx, %ebx
> @@ -28,24 +28,32 @@ ASM_PFX(AsmTransferControl):
> movl %ebx, jmp_addr(%rip)
> lret
> _AsmTransferControl_al_0000:
> + # Old SS should still be okay?
> + addl _AsmTransferControl_al_0001-_AsmTransferControl_al_0000, %eax
> + pushl $0x28
> + pushl %eax
> + movq %cr0, %rax
> + movq %cr4, %rbx
> + andl $0x7fffffff, %eax
> + andb $0xdf, %bl
> + movq %rax, %cr0 # sets EFER.LMA=0 too, so says Intel
> + movl $0x0c0000080, %ecx
> + rdmsr
> + andb $0xfe, %ah # set EFER.LME=0
> + wrmsr
> + movq %rbx, %cr4 # only now set CR4.PAE=0
> + lret
> +_AsmTransferControl_al_0001:
> .byte 0x0b8, 0x30, 0 # mov ax, 30h as selector
> movl %eax, %ds
> movl %eax, %es
> movl %eax, %fs
> movl %eax, %gs
> movl %eax, %ss
> - movq %cr0, %rax
> - movq %cr4, %rbx
> - .byte 0x66
> - andl $0x7ffffffe, %eax
> - andb $0xdf, %bl
> - movq %rax, %cr0
> - .byte 0x66
> - movl $0x0c0000080, %ecx
> - rdmsr
> - andb $0xfe, %ah
> - wrmsr
> - movq %rbx, %cr4
> + movl %cr0, %rax # Get control register 0
> + .byte 0x66
> + .byte 0x83,0xe0,0xfe # and eax, 0fffffffeh ; Clear PE bit (bit #0)
> + .byte 0xf,0x22,0xc0 # mov cr0, eax ; Activate real mode
I had to add this incremental patch to get it to compile:
diff --git a/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S b/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S
index c28df3f..85d2a36 100644
--- a/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S
+++ b/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/X64/S3Asm.S
@@ -30,8 +30,8 @@ ASM_PFX(AsmTransferControl):
_AsmTransferControl_al_0000:
# Old SS should still be okay?
addl _AsmTransferControl_al_0001-_AsmTransferControl_al_0000, %eax
- pushl $0x28
- pushl %eax
+ .byte 0x6a,0x28 # pushl $0x28 ; opnd sz = 32bits in seg 0x10
+ .byte 0x50 # pushl %eax
movq %cr0, %rax
movq %cr4, %rbx
andl $0x7fffffff, %eax
@@ -50,7 +50,7 @@ _AsmTransferControl_al_0001:
movl %eax, %fs
movl %eax, %gs
movl %eax, %ss
- movl %cr0, %rax # Get control register 0
+ .byte 0x0f,0x20,0xc0 # movl %cr0, %eax ; Get control register 0
.byte 0x66
.byte 0x83,0xe0,0xfe # and eax, 0fffffffeh ; Clear PE bit (bit #0)
.byte 0xf,0x22,0xc0 # mov cr0, eax ; Activate real mode
The 2nd lret is reached (just before _AsmTransferControl_al_0001), but then the CPU goes off in the woods. For a while it seems to be spinning who knows where, and in 15-20 seconds or so the guest reboots.
Does gas support mode switches in one file? I found examples on the net (for nasm I think) where people were thunking to real mode and back to protected mode in a single assembly file, and they could use native mnemonics for each part. (They just switched the assembler's mode in sync with execution modes.)
Thanks
Laszlo
Thanks,
Laszlo
------------------------------------------------------------------------------
Sponsored by Intel(R) XDK
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
2013-12-05 22:38 ` Laszlo Ersek
@ 2013-12-05 22:53 ` Andrew Fish
2013-12-07 16:25 ` David Woodhouse
1 sibling, 0 replies; 16+ messages in thread
From: Andrew Fish @ 2013-12-05 22:53 UTC (permalink / raw)
To: edk2-devel; +Cc: Paolo Bonzini, KVM devel mailing list
[-- Attachment #1.1: Type: text/plain, Size: 878 bytes --]
On Dec 5, 2013, at 2:38 PM, Laszlo Ersek <lersek@redhat.com> wrote:
>
> Does gas support mode switches in one file? I found examples on the net (for nasm I think) where people were thunking to real mode and back to protected mode in a single assembly file, and they could use native mnemonics for each part. (They just switched the assembler's mode in sync with execution modes.)
Unfortunately the llvm assembler does not support 16-bit mode :(, and we try to keep the .S assembly common….
So it is possible, but then we have to add a big #ifdef for llvm/clang to use the .byte hackery to make that work.
Given the thrash on these 2 files, maybe it worth doing the GNU native mnemonics version to get things working and then porting that back to llvm after it is stable? I can help with llvm/clang/Xcode related issues and porting.
Thanks,
Andrew Fish
[-- Attachment #1.2: Type: text/html, Size: 2325 bytes --]
[-- Attachment #2: Type: text/plain, Size: 279 bytes --]
------------------------------------------------------------------------------
Sponsored by Intel(R) XDK
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
[-- Attachment #3: Type: text/plain, Size: 161 bytes --]
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
2013-12-05 18:29 ` Laszlo Ersek
@ 2013-12-06 12:03 ` Paolo Bonzini
2013-12-06 13:31 ` Paolo Bonzini
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Paolo Bonzini @ 2013-12-06 12:03 UTC (permalink / raw)
To: edk2-devel; +Cc: Laszlo Ersek, KVM devel mailing list
Il 05/12/2013 19:29, Laszlo Ersek ha scritto:
> On 12/05/13 18:42, Paolo Bonzini wrote:
>> Il 05/12/2013 17:12, Laszlo Ersek ha scritto:
>>> Hi,
>>>
>>> I'm working on S3 suspend/resume in OVMF. The problem is that I'm getting an
>>> unexpected guest reboot for code (LRET) that works on physical hardware. I
>>> tried to trace the problem with ftrace, but I didn't get any mentions of
>>> em_ret_far(). (Maybe I was looking in the wrong place.)
>>
>> What does ftrace say anyway?
>
> (pls. see in the next msg I sent)
Actually I meant the ftrace without any patches.
Thanks to your binary I now reproduced the issue and it looks like the
64-bit->16-bit switch works:
qemu-system-x86-4081 [001] 62650.335040: kvm_exit: reason CR_ACCESS rip 0x3cf7ae45 info 0 0
qemu-system-x86-4081 [001] 62650.335041: kvm_cr: cr_write 0 = 0x32
qemu-system-x86-4081 [001] 62650.335046: kvm_entry: vcpu 0
This is the "mov %rax, %cr0". PE and PG are turned off.
qemu-system-x86-4081 [001] 62650.335047: kvm_exit: reason MSR_READ rip 0x3cf7ae4e info 0 0
qemu-system-x86-4081 [001] 62650.335048: kvm_msr: msr_read c0000080 = 0x100
qemu-system-x86-4081 [001] 62650.335048: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335048: kvm_exit: reason MSR_WRITE rip 0x3cf7ae53 info 0 0
qemu-system-x86-4081 [001] 62650.335049: kvm_msr: msr_write c0000080 = 0x0
qemu-system-x86-4081 [001] 62650.335050: kvm_entry: vcpu 0
LME is turned off.
qemu-system-x86-4081 [001] 62650.335050: kvm_exit: reason CR_ACCESS rip 0x3cf7ae55 info 304 0
qemu-system-x86-4081 [001] 62650.335050: kvm_cr: cr_write 4 = 0x640
qemu-system-x86-4081 [001] 62650.335053: kvm_entry: vcpu 0
PAE is turned off.
qemu-system-x86-4081 [001] 62650.335054: kvm_exit: reason CR_ACCESS rip 0x11e6 info 0 0
qemu-system-x86-4081 [001] 62650.335054: kvm_cr: cr_write 0 = 0x33
qemu-system-x86-4081 [001] 62650.335054: kvm_entry: vcpu 0
Here we're already in real mode. The weird RIP is explained by
the first few bytes after the FACS resume vector:
0x9a1d:0000: cli
0x9a1d:0001: cld
0x9a1d:0002: ljmp $9900,$11d7
9900:11d7 is the same physical address as 9a1d:0007. Fast forward a bit:
qemu-system-x86-4081 [001] 62650.335071: kvm_exit: reason CR_ACCESS rip 0x9aec7 info 0 0
qemu-system-x86-4081 [001] 62650.335071: kvm_cr: cr_write 0 = 0x80010001
qemu-system-x86-4081 [001] 62650.335074: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335076: kvm_exit: reason TRIPLE_FAULT rip 0x0 info 0 0
Disassembling mixed 16-/32-/64-bit is a pain, so I ran QEMU with -no-shutdown
-no-reboot and dumped the memory with
(qemu) dump-guest-memory tramp.dmp 0x90000 0x10000
Lets look at the rest of the trace now. After the previous cr0 access we have:
qemu-system-x86-4081 [001] 62650.335055: kvm_exit: reason CR_ACCESS rip 0x11fa info 0 0
qemu-system-x86-4081 [001] 62650.335055: kvm_cr: cr_write 0 = 0x32
qemu-system-x86-4081 [001] 62650.335055: kvm_entry: vcpu 0
It gets out to real mode again. No idea why. It does some setup.
qemu-system-x86-4081 [001] 62650.335056: kvm_exit: reason MSR_WRITE rip 0x1258 info 0 0
qemu-system-x86-4081 [001] 62650.335056: kvm_msr: msr_write 1a0 = 0x1
qemu-system-x86-4081 [001] 62650.335057: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335057: kvm_exit: reason WBINVD rip 0x1001 info 0 0
qemu-system-x86-4081 [001] 62650.335057: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335058: kvm_exit: reason CPUID rip 0x1073 info 0 0
qemu-system-x86-4081 [001] 62650.335058: kvm_cpuid: func 0 rax 4 rbx 756e6547 rcx 6c65746e rdx 49656e69
qemu-system-x86-4081 [001] 62650.335059: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335059: kvm_exit: reason CPUID rip 0x10c0 info 0 0
qemu-system-x86-4081 [001] 62650.335059: kvm_cpuid: func 1 rax 663 rbx 800 rcx 80802001 rdx 78bfbfd
qemu-system-x86-4081 [001] 62650.335059: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335060: kvm_exit: reason CPUID rip 0x10ff info 0 0
qemu-system-x86-4081 [001] 62650.335060: kvm_cpuid: func 1 rax 663 rbx 800 rcx 80802001 rdx 78bfbfd
qemu-system-x86-4081 [001] 62650.335060: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335061: kvm_exit: reason CPUID rip 0x1117 info 0 0
qemu-system-x86-4081 [001] 62650.335061: kvm_cpuid: func 80000000 rax 8000000a rbx 756e6547 rcx 6c65746e rdx 49656e69
qemu-system-x86-4081 [001] 62650.335061: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335062: kvm_exit: reason CPUID rip 0x1127 info 0 0
qemu-system-x86-4081 [001] 62650.335062: kvm_cpuid: func 80000001 rax 663 rbx 0 rcx 1 rdx 2191abfd
qemu-system-x86-4081 [001] 62650.335062: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335063: kvm_exit: reason CPUID rip 0x113f info 0 0
qemu-system-x86-4081 [001] 62650.335063: kvm_cpuid: func 1 rax 663 rbx 800 rcx 80802001 rdx 78bfbfd
qemu-system-x86-4081 [001] 62650.335063: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335064: kvm_exit: reason CR_ACCESS rip 0x103c info 0 0
qemu-system-x86-4081 [001] 62650.335064: kvm_cr: cr_write 0 = 0x1
qemu-system-x86-4081 [001] 62650.335064: kvm_entry: vcpu 0
Enabling protected mode:
0009A036 66B801000000 mov eax,0x1
0009A03C 0F22C0 mov cr0,eax
0009A03F 66EA90AE09000800 jmp dword 0x8:0x9ae90
This is a 32-bit selector.
0009AE90 8ED2 mov ss,edx
0009AE92 81C400900900 add esp,0x99000
0009AE98 8EDA mov ds,edx
0009AE9A 8EC2 mov es,edx
0009AE9C 8EE2 mov fs,edx
0009AE9E 8EEA mov gs,edx
qemu-system-x86-4081 [001] 62650.335065: kvm_exit: reason CR_ACCESS rip 0x9aea5 info 4 0
qemu-system-x86-4081 [001] 62650.335065: kvm_cr: cr_write 4 = 0x6f0
qemu-system-x86-4081 [001] 62650.335066: kvm_entry: vcpu 0
Enabling PAE:
0009AEA0 A110D00900 mov eax,[0x9d010]
0009AEA5 0F22E0 mov cr4,eax
qemu-system-x86-4081 [001] 62650.335067: kvm_exit: reason CR_ACCESS rip 0x9aead info 3 0
qemu-system-x86-4081 [001] 62650.335067: kvm_cr: cr_write 3 = 0x9c000
qemu-system-x86-4081 [001] 62650.335068: kvm_entry: vcpu 0
Setting CR3
0009AEA8 B800C00900 mov eax,0x9c000
0009AEAD 0F22D8 mov cr3,eax
qemu-system-x86-4081 [001] 62650.335068: kvm_exit: reason MSR_WRITE rip 0x9aec0 info 0 0
qemu-system-x86-4081 [001] 62650.335070: kvm_msr: msr_write c0000080 = 0x901
qemu-system-x86-4081 [001] 62650.335070: kvm_entry: vcpu 0
Enabling LME
0009AEB0 A108D00900 mov eax,[0x9d008]
0009AEB5 8B150CD00900 mov edx,[dword 0x9d00c]
0009AEBB B9800000C0 mov ecx,0xc0000080
0009AEC0 0F30 wrmsr
qemu-system-x86-4081 [001] 62650.335071: kvm_exit: reason CR_ACCESS rip 0x9aec7 info 0 0
qemu-system-x86-4081 [001] 62650.335071: kvm_cr: cr_write 0 = 0x80010001
qemu-system-x86-4081 [001] 62650.335074: kvm_entry: vcpu 0
Enabling paging
0009AEC2 B801000180 mov eax,0x80010001
0009AEC7 0F22C0 mov cr0,eax
But before we get here:
0009AECA EA30AF09001000 jmp dword 0x10:0x9af30
... kaboom:
qemu-system-x86-4081 [001] 62650.335076: kvm_exit: reason TRIPLE_FAULT rip 0x0 info 0 0
The page tables are, ahem, crap:
000c000: 6750 fe01 0000 0000 0000 0000 0000 0000 gP..............
000c010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c0a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c0b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c0c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c0d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c0e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c0f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
This is 0x9c000. Strikes any bell?
Paolo
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
2013-12-06 12:03 ` Paolo Bonzini
@ 2013-12-06 13:31 ` Paolo Bonzini
2013-12-06 13:46 ` Yao, Jiewen
2013-12-06 13:31 ` Yao, Jiewen
2013-12-08 17:43 ` Laszlo Ersek
2 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2013-12-06 13:31 UTC (permalink / raw)
Cc: edk2-devel, Laszlo Ersek, KVM devel mailing list
Il 06/12/2013 13:03, Paolo Bonzini ha scritto:
> The page tables are, ahem, crap:
>
> 000c000: 6750 fe01 0000 0000 0000 0000 0000 0000 gP..............
> 000c010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
>
> This is 0x9c000. Strikes any bell?
Uh-oh, actually it's fine and it's my turn to say I didn't look far
enough.
That far jump is not where it's failing. It's quite close, but it's
at a much more interesting place. Still, indeed it's OVMF's fault.
I tried tracing again, this time without unrestricted_guest. I wanted to
see emulation around the time we enter long mode, but it went a little past
that place.
We get interesting results anyway because the EPT tables are rebuilt on
changes to CR3.PG=0:
qemu-system-x86-6785 [003] 67184.361164: kvm_exit: reason EPT_VIOLATION rip 0x9aeca info 81 0
qemu-system-x86-6785 [003] 67184.361165: kvm_page_fault: address 9c000 error_code 81
level 4
qemu-system-x86-6785 [003] 67184.361165: kvm_entry: vcpu 0
qemu-system-x86-6785 [003] 67184.361166: kvm_exit: reason EPT_VIOLATION rip 0x9aeca info 81 0
qemu-system-x86-6785 [003] 67184.361166: kvm_page_fault: address 1fe5000 error_code 81
qemu-system-x86-6785 [003] 67184.361168: kvm_mmu_get_page: new sp gfn 1e00 0/1 q0 direct --- !pge !nxe root 0 sync
level 3
qemu-system-x86-6785 [003] 67184.361169: kvm_entry: vcpu 0
qemu-system-x86-6785 [003] 67184.361169: kvm_exit: reason EPT_VIOLATION rip 0x9aeca info 81 0
qemu-system-x86-6785 [003] 67184.361169: kvm_page_fault: address 1fe6000 error_code 81
level 2
qemu-system-x86-6785 [003] 67184.361170: kvm_entry: vcpu 0
qemu-system-x86-6785 [003] 67184.361171: kvm_exit: reason EPT_VIOLATION rip 0x9aeca info 81 0
qemu-system-x86-6785 [003] 67184.361171: kvm_page_fault: address 1fe74d0 error_code 81
level 1 (note 0x4D0 means the 0x4D*2=0x9A-th entry, i.e virtual address 0x9A000)
Another way to get this information would be more simply to attach gdb to the running
machine. On one hand setting breakpoints is easy (remember they are virtual addresses,
and always use hardware breakpoints with "hb" so that you do not touch memory). But
it's complicated to use gdb across mode switches, and we're quite lucky that tracing
got us fast what we need!
qemu-system-x86-6785 [003] 67184.361171: kvm_entry: vcpu 0
qemu-system-x86-6785 [003] 67184.361172: kvm_exit: reason EPT_VIOLATION rip 0xffffffff81000110 info 81 0
qemu-system-x86-6785 [003] 67184.361172: kvm_page_fault: address 1c0fff0 error_code 81
level 4
qemu-system-x86-6785 [003] 67184.361173: kvm_mmu_get_page: new sp gfn 1c00 0/1 q0 direct --- !pge !nxe root 0 sync
qemu-system-x86-6785 [003] 67184.361174: kvm_entry: vcpu 0
qemu-system-x86-6785 [003] 67184.361174: kvm_exit: reason EPT_VIOLATION rip 0xffffffff81000110 info 81 0
qemu-system-x86-6785 [003] 67184.361174: kvm_page_fault: address 1c10040 error_code 81
level 3. We should be here:
0xffffffff81000110: mov $0x1c0c000,%rax
0xffffffff81000117: mov $0xa0,%ecx
0xffffffff8100011c: mov %rcx,%cr4
0xffffffff8100011f: add 0xc12eea(%rip),%rax # 0xffffffff81c13010
0xffffffff81000126: mov %rax,%cr3
0xffffffff81000129: mov $0xffffffff81000132,%rax
0xffffffff81000130: jmpq *%rax
(grabbed from "dump-guest-memory -p" and gdb's disass command,
right after suspending the system)
qemu-system-x86-6785 [003] 67184.361175: kvm_entry: vcpu 0
qemu-system-x86-6785 [003] 67184.361176: kvm_exit: reason EPT_VIOLATION rip 0xffffffff81000113 info 181 0
qemu-system-x86-6785 [003] 67184.361176: kvm_page_fault: address 48 error_code 181
this rip is bogus! Let's grab another "dump-guest-memory -p", this
time after shutdown; remember I'm using -no-shutdown -no-reboot:
0xffffffff81000110: mov -0x18(%rbp),%eax
0xffffffff81000113: mov 0x48(%rax),%rax
0xffffffff81000117: mov -0x30(%rbp),%rsi
0xffffffff8100011b: lea -0x48(%rbp),%rdi
0xffffffff8100011f: mov -0x18(%rbp),%rcx
0xffffffff81000123: lea -0x40(%rbp),%rdx
0xffffffff81000127: mov %rdx,0x28(%rsp)
0xffffffff8100012c: lea -0x38(%rbp),%rdx
Uh oh. Something is corrupting virtual address 0xffffffff81000110,
which corresponds to physical address 0x1000110.
qemu-system-x86-6785 [003] 67184.361177: kvm_entry: vcpu 0
qemu-system-x86-6785 [003] 67184.361177: kvm_exit: reason EPT_VIOLATION rip 0xffffffff81000127 info 182 0
This rip is also bogus, no surprise it triple faults soon
qemu-system-x86-6785 [003] 67184.361177: kvm_page_fault: address 9e048 error_code 182
qemu-system-x86-6785 [003] 67184.361178: kvm_entry: vcpu 0
qemu-system-x86-6785 [003] 67184.361179: kvm_exit: reason TRIPLE_FAULT rip 0x0 info 0 0
Still an EDK2 problem. Perhaps you can dump the first few bytes of
0x1000110..0x10011f every time a PEIM is loaded?
Paolo
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
2013-12-06 12:03 ` Paolo Bonzini
2013-12-06 13:31 ` Paolo Bonzini
@ 2013-12-06 13:31 ` Yao, Jiewen
2013-12-08 17:43 ` Laszlo Ersek
2 siblings, 0 replies; 16+ messages in thread
From: Yao, Jiewen @ 2013-12-06 13:31 UTC (permalink / raw)
To: edk2-devel@lists.sourceforge.net; +Cc: KVM devel mailing list
Hi Paolo
Do you mean the OVEM already transfer control to OS Waking vector?
System stuck just because the page table in 0x9C000 is corrupt?
Thank you
Yao Jiewen
-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Friday, December 06, 2013 8:03 PM
To: edk2-devel@lists.sourceforge.net
Cc: KVM devel mailing list
Subject: Re: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
Il 05/12/2013 19:29, Laszlo Ersek ha scritto:
> On 12/05/13 18:42, Paolo Bonzini wrote:
>> Il 05/12/2013 17:12, Laszlo Ersek ha scritto:
>>> Hi,
>>>
>>> I'm working on S3 suspend/resume in OVMF. The problem is that I'm
>>> getting an unexpected guest reboot for code (LRET) that works on
>>> physical hardware. I tried to trace the problem with ftrace, but I
>>> didn't get any mentions of em_ret_far(). (Maybe I was looking in the
>>> wrong place.)
>>
>> What does ftrace say anyway?
>
> (pls. see in the next msg I sent)
Actually I meant the ftrace without any patches.
Thanks to your binary I now reproduced the issue and it looks like the
64-bit->16-bit switch works:
qemu-system-x86-4081 [001] 62650.335040: kvm_exit: reason CR_ACCESS rip 0x3cf7ae45 info 0 0
qemu-system-x86-4081 [001] 62650.335041: kvm_cr: cr_write 0 = 0x32
qemu-system-x86-4081 [001] 62650.335046: kvm_entry: vcpu 0
This is the "mov %rax, %cr0". PE and PG are turned off.
qemu-system-x86-4081 [001] 62650.335047: kvm_exit: reason MSR_READ rip 0x3cf7ae4e info 0 0
qemu-system-x86-4081 [001] 62650.335048: kvm_msr: msr_read c0000080 = 0x100
qemu-system-x86-4081 [001] 62650.335048: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335048: kvm_exit: reason MSR_WRITE rip 0x3cf7ae53 info 0 0
qemu-system-x86-4081 [001] 62650.335049: kvm_msr: msr_write c0000080 = 0x0
qemu-system-x86-4081 [001] 62650.335050: kvm_entry: vcpu 0
LME is turned off.
qemu-system-x86-4081 [001] 62650.335050: kvm_exit: reason CR_ACCESS rip 0x3cf7ae55 info 304 0
qemu-system-x86-4081 [001] 62650.335050: kvm_cr: cr_write 4 = 0x640
qemu-system-x86-4081 [001] 62650.335053: kvm_entry: vcpu 0
PAE is turned off.
qemu-system-x86-4081 [001] 62650.335054: kvm_exit: reason CR_ACCESS rip 0x11e6 info 0 0
qemu-system-x86-4081 [001] 62650.335054: kvm_cr: cr_write 0 = 0x33
qemu-system-x86-4081 [001] 62650.335054: kvm_entry: vcpu 0
Here we're already in real mode. The weird RIP is explained by
the first few bytes after the FACS resume vector:
0x9a1d:0000: cli
0x9a1d:0001: cld
0x9a1d:0002: ljmp $9900,$11d7
9900:11d7 is the same physical address as 9a1d:0007. Fast forward a bit:
qemu-system-x86-4081 [001] 62650.335071: kvm_exit: reason CR_ACCESS rip 0x9aec7 info 0 0
qemu-system-x86-4081 [001] 62650.335071: kvm_cr: cr_write 0 = 0x80010001
qemu-system-x86-4081 [001] 62650.335074: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335076: kvm_exit: reason TRIPLE_FAULT rip 0x0 info 0 0
Disassembling mixed 16-/32-/64-bit is a pain, so I ran QEMU with -no-shutdown -no-reboot and dumped the memory with
(qemu) dump-guest-memory tramp.dmp 0x90000 0x10000
Lets look at the rest of the trace now. After the previous cr0 access we have:
qemu-system-x86-4081 [001] 62650.335055: kvm_exit: reason CR_ACCESS rip 0x11fa info 0 0
qemu-system-x86-4081 [001] 62650.335055: kvm_cr: cr_write 0 = 0x32
qemu-system-x86-4081 [001] 62650.335055: kvm_entry: vcpu 0
It gets out to real mode again. No idea why. It does some setup.
qemu-system-x86-4081 [001] 62650.335056: kvm_exit: reason MSR_WRITE rip 0x1258 info 0 0
qemu-system-x86-4081 [001] 62650.335056: kvm_msr: msr_write 1a0 = 0x1
qemu-system-x86-4081 [001] 62650.335057: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335057: kvm_exit: reason WBINVD rip 0x1001 info 0 0
qemu-system-x86-4081 [001] 62650.335057: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335058: kvm_exit: reason CPUID rip 0x1073 info 0 0
qemu-system-x86-4081 [001] 62650.335058: kvm_cpuid: func 0 rax 4 rbx 756e6547 rcx 6c65746e rdx 49656e69
qemu-system-x86-4081 [001] 62650.335059: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335059: kvm_exit: reason CPUID rip 0x10c0 info 0 0
qemu-system-x86-4081 [001] 62650.335059: kvm_cpuid: func 1 rax 663 rbx 800 rcx 80802001 rdx 78bfbfd
qemu-system-x86-4081 [001] 62650.335059: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335060: kvm_exit: reason CPUID rip 0x10ff info 0 0
qemu-system-x86-4081 [001] 62650.335060: kvm_cpuid: func 1 rax 663 rbx 800 rcx 80802001 rdx 78bfbfd
qemu-system-x86-4081 [001] 62650.335060: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335061: kvm_exit: reason CPUID rip 0x1117 info 0 0
qemu-system-x86-4081 [001] 62650.335061: kvm_cpuid: func 80000000 rax 8000000a rbx 756e6547 rcx 6c65746e rdx 49656e69
qemu-system-x86-4081 [001] 62650.335061: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335062: kvm_exit: reason CPUID rip 0x1127 info 0 0
qemu-system-x86-4081 [001] 62650.335062: kvm_cpuid: func 80000001 rax 663 rbx 0 rcx 1 rdx 2191abfd
qemu-system-x86-4081 [001] 62650.335062: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335063: kvm_exit: reason CPUID rip 0x113f info 0 0
qemu-system-x86-4081 [001] 62650.335063: kvm_cpuid: func 1 rax 663 rbx 800 rcx 80802001 rdx 78bfbfd
qemu-system-x86-4081 [001] 62650.335063: kvm_entry: vcpu 0
qemu-system-x86-4081 [001] 62650.335064: kvm_exit: reason CR_ACCESS rip 0x103c info 0 0
qemu-system-x86-4081 [001] 62650.335064: kvm_cr: cr_write 0 = 0x1
qemu-system-x86-4081 [001] 62650.335064: kvm_entry: vcpu 0
Enabling protected mode:
0009A036 66B801000000 mov eax,0x1
0009A03C 0F22C0 mov cr0,eax
0009A03F 66EA90AE09000800 jmp dword 0x8:0x9ae90
This is a 32-bit selector.
0009AE90 8ED2 mov ss,edx
0009AE92 81C400900900 add esp,0x99000
0009AE98 8EDA mov ds,edx
0009AE9A 8EC2 mov es,edx
0009AE9C 8EE2 mov fs,edx
0009AE9E 8EEA mov gs,edx
qemu-system-x86-4081 [001] 62650.335065: kvm_exit: reason CR_ACCESS rip 0x9aea5 info 4 0
qemu-system-x86-4081 [001] 62650.335065: kvm_cr: cr_write 4 = 0x6f0
qemu-system-x86-4081 [001] 62650.335066: kvm_entry: vcpu 0
Enabling PAE:
0009AEA0 A110D00900 mov eax,[0x9d010]
0009AEA5 0F22E0 mov cr4,eax
qemu-system-x86-4081 [001] 62650.335067: kvm_exit: reason CR_ACCESS rip 0x9aead info 3 0
qemu-system-x86-4081 [001] 62650.335067: kvm_cr: cr_write 3 = 0x9c000
qemu-system-x86-4081 [001] 62650.335068: kvm_entry: vcpu 0
Setting CR3
0009AEA8 B800C00900 mov eax,0x9c000
0009AEAD 0F22D8 mov cr3,eax
qemu-system-x86-4081 [001] 62650.335068: kvm_exit: reason MSR_WRITE rip 0x9aec0 info 0 0
qemu-system-x86-4081 [001] 62650.335070: kvm_msr: msr_write c0000080 = 0x901
qemu-system-x86-4081 [001] 62650.335070: kvm_entry: vcpu 0
Enabling LME
0009AEB0 A108D00900 mov eax,[0x9d008]
0009AEB5 8B150CD00900 mov edx,[dword 0x9d00c]
0009AEBB B9800000C0 mov ecx,0xc0000080
0009AEC0 0F30 wrmsr
qemu-system-x86-4081 [001] 62650.335071: kvm_exit: reason CR_ACCESS rip 0x9aec7 info 0 0
qemu-system-x86-4081 [001] 62650.335071: kvm_cr: cr_write 0 = 0x80010001
qemu-system-x86-4081 [001] 62650.335074: kvm_entry: vcpu 0
Enabling paging
0009AEC2 B801000180 mov eax,0x80010001
0009AEC7 0F22C0 mov cr0,eax
But before we get here:
0009AECA EA30AF09001000 jmp dword 0x10:0x9af30
... kaboom:
qemu-system-x86-4081 [001] 62650.335076: kvm_exit: reason TRIPLE_FAULT rip 0x0 info 0 0
The page tables are, ahem, crap:
000c000: 6750 fe01 0000 0000 0000 0000 0000 0000 gP..............
000c010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c0a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c0b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c0c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c0d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c0e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000c0f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
This is 0x9c000. Strikes any bell?
Paolo
------------------------------------------------------------------------------
Sponsored by Intel(R) XDK
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
2013-12-06 13:31 ` Paolo Bonzini
@ 2013-12-06 13:46 ` Yao, Jiewen
2013-12-06 14:29 ` Paolo Bonzini
0 siblings, 1 reply; 16+ messages in thread
From: Yao, Jiewen @ 2013-12-06 13:46 UTC (permalink / raw)
To: edk2-devel@lists.sourceforge.net; +Cc: KVM devel mailing list
Hi Paolo
I am a little confused here. You said "Still, indeed it's OVMF's fault." and "Still an EDK2 problem." ??????
EDKII BIOS should always create 1:1 mapping virtual-physical address. But I am not clear about OS waking vector.
For "EPT_VIOLATION rip 0xffffffff81000110.", is that happen in EDKII BIOS or in OS waking vector?
All in all, I have interesting to know one thing at first:
Is OVMF crash in BIOS before jump to OS waking vector? Or is OVMF crash inside OS waking vector?
Thank you
Yao Jiewen
-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Friday, December 06, 2013 9:31 PM
Cc: edk2-devel@lists.sourceforge.net; KVM devel mailing list
Subject: Re: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
Il 06/12/2013 13:03, Paolo Bonzini ha scritto:
> The page tables are, ahem, crap:
>
> 000c000: 6750 fe01 0000 0000 0000 0000 0000 0000 gP..............
> 000c010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
>
> This is 0x9c000. Strikes any bell?
Uh-oh, actually it's fine and it's my turn to say I didn't look far enough.
That far jump is not where it's failing. It's quite close, but it's at a much more interesting place. Still, indeed it's OVMF's fault.
I tried tracing again, this time without unrestricted_guest. I wanted to see emulation around the time we enter long mode, but it went a little past that place.
We get interesting results anyway because the EPT tables are rebuilt on changes to CR3.PG=0:
qemu-system-x86-6785 [003] 67184.361164: kvm_exit: reason EPT_VIOLATION rip 0x9aeca info 81 0
qemu-system-x86-6785 [003] 67184.361165: kvm_page_fault: address 9c000 error_code 81
level 4
qemu-system-x86-6785 [003] 67184.361165: kvm_entry: vcpu 0
qemu-system-x86-6785 [003] 67184.361166: kvm_exit: reason EPT_VIOLATION rip 0x9aeca info 81 0
qemu-system-x86-6785 [003] 67184.361166: kvm_page_fault: address 1fe5000 error_code 81
qemu-system-x86-6785 [003] 67184.361168: kvm_mmu_get_page: new sp gfn 1e00 0/1 q0 direct --- !pge !nxe root 0 sync
level 3
qemu-system-x86-6785 [003] 67184.361169: kvm_entry: vcpu 0
qemu-system-x86-6785 [003] 67184.361169: kvm_exit: reason EPT_VIOLATION rip 0x9aeca info 81 0
qemu-system-x86-6785 [003] 67184.361169: kvm_page_fault: address 1fe6000 error_code 81
level 2
qemu-system-x86-6785 [003] 67184.361170: kvm_entry: vcpu 0
qemu-system-x86-6785 [003] 67184.361171: kvm_exit: reason EPT_VIOLATION rip 0x9aeca info 81 0
qemu-system-x86-6785 [003] 67184.361171: kvm_page_fault: address 1fe74d0 error_code 81
level 1 (note 0x4D0 means the 0x4D*2=0x9A-th entry, i.e virtual address 0x9A000)
Another way to get this information would be more simply to attach gdb to the running machine. On one hand setting breakpoints is easy (remember they are virtual addresses, and always use hardware breakpoints with "hb" so that you do not touch memory). But it's complicated to use gdb across mode switches, and we're quite lucky that tracing got us fast what we need!
qemu-system-x86-6785 [003] 67184.361171: kvm_entry: vcpu 0
qemu-system-x86-6785 [003] 67184.361172: kvm_exit: reason EPT_VIOLATION rip 0xffffffff81000110 info 81 0
qemu-system-x86-6785 [003] 67184.361172: kvm_page_fault: address 1c0fff0 error_code 81
level 4
qemu-system-x86-6785 [003] 67184.361173: kvm_mmu_get_page: new sp gfn 1c00 0/1 q0 direct --- !pge !nxe root 0 sync
qemu-system-x86-6785 [003] 67184.361174: kvm_entry: vcpu 0
qemu-system-x86-6785 [003] 67184.361174: kvm_exit: reason EPT_VIOLATION rip 0xffffffff81000110 info 81 0
qemu-system-x86-6785 [003] 67184.361174: kvm_page_fault: address 1c10040 error_code 81
level 3. We should be here:
0xffffffff81000110: mov $0x1c0c000,%rax
0xffffffff81000117: mov $0xa0,%ecx
0xffffffff8100011c: mov %rcx,%cr4
0xffffffff8100011f: add 0xc12eea(%rip),%rax # 0xffffffff81c13010
0xffffffff81000126: mov %rax,%cr3
0xffffffff81000129: mov $0xffffffff81000132,%rax
0xffffffff81000130: jmpq *%rax
(grabbed from "dump-guest-memory -p" and gdb's disass command,
right after suspending the system)
qemu-system-x86-6785 [003] 67184.361175: kvm_entry: vcpu 0
qemu-system-x86-6785 [003] 67184.361176: kvm_exit: reason EPT_VIOLATION rip 0xffffffff81000113 info 181 0
qemu-system-x86-6785 [003] 67184.361176: kvm_page_fault: address 48 error_code 181
this rip is bogus! Let's grab another "dump-guest-memory -p", this
time after shutdown; remember I'm using -no-shutdown -no-reboot:
0xffffffff81000110: mov -0x18(%rbp),%eax
0xffffffff81000113: mov 0x48(%rax),%rax
0xffffffff81000117: mov -0x30(%rbp),%rsi
0xffffffff8100011b: lea -0x48(%rbp),%rdi
0xffffffff8100011f: mov -0x18(%rbp),%rcx
0xffffffff81000123: lea -0x40(%rbp),%rdx
0xffffffff81000127: mov %rdx,0x28(%rsp)
0xffffffff8100012c: lea -0x38(%rbp),%rdx
Uh oh. Something is corrupting virtual address 0xffffffff81000110,
which corresponds to physical address 0x1000110.
qemu-system-x86-6785 [003] 67184.361177: kvm_entry: vcpu 0
qemu-system-x86-6785 [003] 67184.361177: kvm_exit: reason EPT_VIOLATION rip 0xffffffff81000127 info 182 0
This rip is also bogus, no surprise it triple faults soon
qemu-system-x86-6785 [003] 67184.361177: kvm_page_fault: address 9e048 error_code 182
qemu-system-x86-6785 [003] 67184.361178: kvm_entry: vcpu 0
qemu-system-x86-6785 [003] 67184.361179: kvm_exit: reason TRIPLE_FAULT rip 0x0 info 0 0
Still an EDK2 problem. Perhaps you can dump the first few bytes of 0x1000110..0x10011f every time a PEIM is loaded?
Paolo
------------------------------------------------------------------------------
Sponsored by Intel(R) XDK
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
2013-12-06 13:46 ` Yao, Jiewen
@ 2013-12-06 14:29 ` Paolo Bonzini
2013-12-06 14:47 ` Yao, Jiewen
0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2013-12-06 14:29 UTC (permalink / raw)
To: Yao, Jiewen; +Cc: edk2-devel@lists.sourceforge.net, KVM devel mailing list
Il 06/12/2013 14:46, Yao, Jiewen ha scritto:
> Hi Paolo
> I am a little confused here. You said "Still, indeed it's OVMF's fault." and "Still an EDK2 problem." ??????
Sorry for the confusion. I wrote OVMF/EDK2 interchangeably, just to say
"not KVM".
> EDKII BIOS should always create 1:1 mapping virtual-physical address. But I am not clear about OS waking vector.
> For "EPT_VIOLATION rip 0xffffffff81000110.", is that happen in EDKII BIOS or in OS waking vector?
That's after the OS waking vector is invoked. But that memory was
corrupted by EDKII/OVMF before the OS waking vector is invoked.
Paolo
> All in all, I have interesting to know one thing at first:
> Is OVMF crash in BIOS before jump to OS waking vector? Or is OVMF crash inside OS waking vector?
>
> Thank you
------------------------------------------------------------------------------
Sponsored by Intel(R) XDK
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
2013-12-06 14:29 ` Paolo Bonzini
@ 2013-12-06 14:47 ` Yao, Jiewen
2013-12-06 14:51 ` Paolo Bonzini
0 siblings, 1 reply; 16+ messages in thread
From: Yao, Jiewen @ 2013-12-06 14:47 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: edk2-devel@lists.sourceforge.net, KVM devel mailing list
Good investigation. I really appreciate that.
Now, it seems we need OVMF pkg owner to check when 0x9c000 are corrupted, and why.
Thank you
Yao Jiewen
-----Original Message-----
From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo Bonzini
Sent: Friday, December 06, 2013 10:29 PM
To: Yao, Jiewen
Cc: edk2-devel@lists.sourceforge.net; KVM devel mailing list
Subject: Re: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
Il 06/12/2013 14:46, Yao, Jiewen ha scritto:
> Hi Paolo
> I am a little confused here. You said "Still, indeed it's OVMF's fault." and "Still an EDK2 problem." ??????
Sorry for the confusion. I wrote OVMF/EDK2 interchangeably, just to say "not KVM".
> EDKII BIOS should always create 1:1 mapping virtual-physical address. But I am not clear about OS waking vector.
> For "EPT_VIOLATION rip 0xffffffff81000110.", is that happen in EDKII BIOS or in OS waking vector?
That's after the OS waking vector is invoked. But that memory was corrupted by EDKII/OVMF before the OS waking vector is invoked.
Paolo
> All in all, I have interesting to know one thing at first:
> Is OVMF crash in BIOS before jump to OS waking vector? Or is OVMF crash inside OS waking vector?
>
> Thank you
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
2013-12-06 14:47 ` Yao, Jiewen
@ 2013-12-06 14:51 ` Paolo Bonzini
0 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2013-12-06 14:51 UTC (permalink / raw)
To: Yao, Jiewen; +Cc: edk2-devel@lists.sourceforge.net, KVM devel mailing list
Il 06/12/2013 15:47, Yao, Jiewen ha scritto:
> Good investigation. I really appreciate that.
>
> Now, it seems we need OVMF pkg owner to check when 0x9c000 are corrupted, and why.
FWIW it's 0x1000110, not 0x9c000. But everything else is right.
Paolo
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
2013-12-05 22:38 ` Laszlo Ersek
2013-12-05 22:53 ` Andrew Fish
@ 2013-12-07 16:25 ` David Woodhouse
1 sibling, 0 replies; 16+ messages in thread
From: David Woodhouse @ 2013-12-07 16:25 UTC (permalink / raw)
To: Laszlo Ersek; +Cc: Paolo Bonzini, edk2-devel, KVM devel mailing list
[-- Attachment #1: Type: text/plain, Size: 1006 bytes --]
On Thu, 2013-12-05 at 23:38 +0100, Laszlo Ersek wrote:
>
> Does gas support mode switches in one file? I found examples on the
> net (for nasm I think) where people were thunking to real mode and
> back to protected mode in a single assembly file, and they could use
> native mnemonics for each part. (They just switched the assembler's
> mode in sync with execution modes.)
$DEITY yes. See the patch I posted to fix Thunk16.S last week, which
does exactly that. Without that, it's basically unmaintainable.
As Andrew points out, LLVM doesn't support .code16. There's a bug filed.
But frankly, I don't think we should care. Let them fix it. There *is*
active development on LLVM and it *can* be fixed, relatively easily.
It's not like we're talking about requiring fixes to the effectively
unmaintained Microsoft toolchain — which we can't even describe as
"stuck in the 20th century" since it doesn't even support the last C
standard from *that* century either :)
--
dwmw2
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
2013-12-06 12:03 ` Paolo Bonzini
2013-12-06 13:31 ` Paolo Bonzini
2013-12-06 13:31 ` Yao, Jiewen
@ 2013-12-08 17:43 ` Laszlo Ersek
2013-12-08 22:15 ` Laszlo Ersek
2 siblings, 1 reply; 16+ messages in thread
From: Laszlo Ersek @ 2013-12-08 17:43 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: edk2-devel, KVM devel mailing list
On 12/06/13 13:03, Paolo Bonzini wrote:
> Il 05/12/2013 19:29, Laszlo Ersek ha scritto:
>> On 12/05/13 18:42, Paolo Bonzini wrote:
>>> Il 05/12/2013 17:12, Laszlo Ersek ha scritto:
>>>> Hi,
>>>>
>>>> I'm working on S3 suspend/resume in OVMF. The problem is that I'm getting an
>>>> unexpected guest reboot for code (LRET) that works on physical hardware. I
>>>> tried to trace the problem with ftrace, but I didn't get any mentions of
>>>> em_ret_far(). (Maybe I was looking in the wrong place.)
>>>
>>> What does ftrace say anyway?
>>
>> (pls. see in the next msg I sent)
>
> Actually I meant the ftrace without any patches.
>
> Thanks to your binary I now reproduced the issue and it looks like the
> 64-bit->16-bit switch works:
Thank you for spending (apparently more than a little) time on this!
>
> qemu-system-x86-4081 [001] 62650.335040: kvm_exit: reason CR_ACCESS rip 0x3cf7ae45 info 0 0
> qemu-system-x86-4081 [001] 62650.335041: kvm_cr: cr_write 0 = 0x32
> qemu-system-x86-4081 [001] 62650.335046: kvm_entry: vcpu 0
>
> This is the "mov %rax, %cr0". PE and PG are turned off.
I'm surprised by this result. The instruction you refer to is below
"_AsmTransferControl_al_0000" (in the original, unpatched code).
I had earlier added an infinite loop right below that label (a different
loop than my xxxx debug loop), and it was *never* reached in my test.
That is, from the lret that I reported as problematic, to the
instruction you refer to, the CPU would have had to cross (and finish)
the infinite loop that I had added earlier. And that never happened in
my test.
I had added that loop at "_AsmTransferControl_al_0000" immediately
precisely because I wanted to see if the label is reached and the
problem is with something below that label, or with the first lret. I
sent my email to the KVM list after I had isolated the problem to the
first LRET:
http://thread.gmane.org/gmane.comp.bios.tianocore.devel/5297/focus=5325
On 12/04/13 19:05, Laszlo Ersek wrote:
> I tested if the (intended) target location of the LRET is reached, and
> it is not. (It's easy to test by adding a small infinite loop, moving
> it around, and seeing if the VM is spinning with or without producing
> a bunch of output on the debug port.) It's *really* that
> internally-targeted LRET that causes a reboot. [...]
I have absolutely no clue why this code executes for you and doesn't for
me :) What guest RAM size did you test with?
> qemu-system-x86-4081 [001] 62650.335047: kvm_exit: reason MSR_READ rip 0x3cf7ae4e info 0 0
> qemu-system-x86-4081 [001] 62650.335048: kvm_msr: msr_read c0000080 = 0x100
> qemu-system-x86-4081 [001] 62650.335048: kvm_entry: vcpu 0
> qemu-system-x86-4081 [001] 62650.335048: kvm_exit: reason MSR_WRITE rip 0x3cf7ae53 info 0 0
> qemu-system-x86-4081 [001] 62650.335049: kvm_msr: msr_write c0000080 = 0x0
> qemu-system-x86-4081 [001] 62650.335050: kvm_entry: vcpu 0
>
> LME is turned off.
>
> qemu-system-x86-4081 [001] 62650.335050: kvm_exit: reason CR_ACCESS rip 0x3cf7ae55 info 304 0
> qemu-system-x86-4081 [001] 62650.335050: kvm_cr: cr_write 4 = 0x640
> qemu-system-x86-4081 [001] 62650.335053: kvm_entry: vcpu 0
>
> PAE is turned off.
>
> qemu-system-x86-4081 [001] 62650.335054: kvm_exit: reason CR_ACCESS rip 0x11e6 info 0 0
> qemu-system-x86-4081 [001] 62650.335054: kvm_cr: cr_write 0 = 0x33
> qemu-system-x86-4081 [001] 62650.335054: kvm_entry: vcpu 0
>
> Here we're already in real mode. The weird RIP is explained by
> the first few bytes after the FACS resume vector:
>From this point on you were debugging the Linux wakeup code, in
"arch/x86/realmode/rm/wakeup_asm.S". I think.
>
> 0x9a1d:0000: cli
> 0x9a1d:0001: cld
> 0x9a1d:0002: ljmp $9900,$11d7
ENTRY(wakeup_start)
cli
cld
LJMPW_RM(3f)
3:
/* Apparently some dimwit BIOS programmers don't know how to
program a PM to RM transition, and we might end up here with
junk in the data segment descriptor registers. The only way
to repair that is to go into PM and fix it ourselves... */
[...]
>From Linux kernel commit 4b4f7280.
> The page tables are, ahem, crap:
>
> 000c000: 6750 fe01 0000 0000 0000 0000 0000 0000 gP..............
> 000c010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 000c0f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
>
> This is 0x9c000. Strikes any bell?
We're wildly corrupting OS memory during OVMF S3 resume. That's a known
problem and the next stage for me to figure out (with Jordan's help
hopefully):
http://thread.gmane.org/gmane.comp.bios.tianocore.devel/5297/focus=5321
http://thread.gmane.org/gmane.comp.bios.tianocore.devel/5297/focus=5325
So, your tracing reached / debugged code that I had never ever reached.
And my report was precisely about not reaching it. Once we reach it,
it's expected to blow up, but first I wanted to get there.
Again, the 64-bit->16-bit switch (in the original, unpatched edk2/OVMF
code) never worked for me.
I think I did find the reason for that though, please see
http://thread.gmane.org/gmane.comp.bios.tianocore.devel/5343/focus=5365
especially the last patch attached to it.
The likely reason for the failure I was seeing is that the 16-bit code
had been relocated to way above 1MB and could not be addressed with the
16-bit CS:IP notation at all.
Thanks!
Laszlo
------------------------------------------------------------------------------
Sponsored by Intel(R) XDK
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline
2013-12-08 17:43 ` Laszlo Ersek
@ 2013-12-08 22:15 ` Laszlo Ersek
0 siblings, 0 replies; 16+ messages in thread
From: Laszlo Ersek @ 2013-12-08 22:15 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: edk2-devel, KVM devel mailing list
On 12/08/13 18:43, Laszlo Ersek wrote:
> On 12/06/13 13:03, Paolo Bonzini wrote:
>> Thanks to your binary I now reproduced the issue and it looks like the
>> 64-bit->16-bit switch works:
>
> Thank you for spending (apparently more than a little) time on this!
>
>>
>> qemu-system-x86-4081 [001] 62650.335040: kvm_exit: reason CR_ACCESS rip 0x3cf7ae45 info 0 0
>> qemu-system-x86-4081 [001] 62650.335041: kvm_cr: cr_write 0 = 0x32
>> qemu-system-x86-4081 [001] 62650.335046: kvm_entry: vcpu 0
>>
>> This is the "mov %rax, %cr0". PE and PG are turned off.
>
> I'm surprised by this result. The instruction you refer to is below
> "_AsmTransferControl_al_0000" (in the original, unpatched code).
>
> I had earlier added an infinite loop right below that label (a different
> loop than my xxxx debug loop), and it was *never* reached in my test.
> That is, from the lret that I reported as problematic, to the
> instruction you refer to, the CPU would have had to cross (and finish)
> the infinite loop that I had added earlier. And that never happened in
> my test.
>
> I had added that loop at "_AsmTransferControl_al_0000" immediately
> precisely because I wanted to see if the label is reached and the
> problem is with something below that label, or with the first lret. I
> sent my email to the KVM list after I had isolated the problem to the
> first LRET:
>
> http://thread.gmane.org/gmane.comp.bios.tianocore.devel/5297/focus=5325
>
> On 12/04/13 19:05, Laszlo Ersek wrote:
>> I tested if the (intended) target location of the LRET is reached, and
>> it is not. (It's easy to test by adding a small infinite loop, moving
>> it around, and seeing if the VM is spinning with or without producing
>> a bunch of output on the debug port.) It's *really* that
>> internally-targeted LRET that causes a reboot. [...]
First, you're right and I'm wrong. I can see the triple fault in my kvm
trace as well.
Second... This is incredible. Guess what the following patch does.
ASM_GLOBAL ASM_PFX(AsmTransferControl)
ASM_PFX(AsmTransferControl):
# rcx S3WakingVector :DWORD
# rdx AcpiLowMemoryBase :DWORD
lea _AsmTransferControl_al_0000(%rip), %eax
movq $0x2800000000, %r8
orq %r8, %rax
pushq %rax
shrd $20, %ecx, %ebx
andl $0x0f, %ecx
movw %cx, %bx
movl %ebx, jmp_addr(%rip)
lret
_AsmTransferControl_al_0000:
.byte 0x0b8, 0x30, 0 # mov ax, 30h as selector
movl %eax, %ds
movl %eax, %es
movl %eax, %fs
movl %eax, %gs
movl %eax, %ss
+.code16
+ movw $0x402, %dx
+ movb $0x58, %al
+qqq:
+ outb %al, %dx
+ jmp qqq
+ movb $0x59, %al
+ outb %al, %dx
+.code64
movq %cr0, %rax
movq %cr4, %rbx
The infinite loop that I used to "pin down" the first instruction that
broke around lret -- it was useless.
The above patch should produce an infinite string of "X" characters on
the qemu debug port. "After" the inifnite string, it should produce a
"Y" character.
In practice, I'm seeing *one* "X" character, and then a triple fault.
The "jmp qqq" instruction *itself* causes a triple fault (a guest
reboot), masking the problem that I was looking for.
When I had such an empty infinite loop (at that time, still without the
outb) right before the lret, it worked. As soon as I moved it below
_AsmTransferControl_al_0000 (ie. under the target of the lret), the jmp
*itself* caused a triple fault (guest reboot), causing me to think that
the *lret* caused the reboot. Because, the tight jmp loop would "always"
work, and if I'm seeing a reboot loop instead of tight spinning right
after the lret, that means that the lret caused the reboot, right? Right?
This is the binary:
0000000000000037 <qqq>:
37: ee out %al,(%dx)
38: eb fd jmp 37 <qqq>
Opcode Instruction Op/ 64-Bit Compat/ Description
En Mode Leg Mode
EB cb JMP rel8 A Valid Valid Jump short, RIP = RIP + 8-bit
displacement sign extended
to 64-bits
My belief in the sanity of x86 has been shattered to its core. I don't
feel stupid. I feel cheated.
Thank you for your help.
Laszlo
------------------------------------------------------------------------------
Sponsored by Intel(R) XDK
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2013-12-08 22:15 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-05 16:12 [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline Laszlo Ersek
2013-12-05 16:50 ` Laszlo Ersek
2013-12-05 17:42 ` Paolo Bonzini
2013-12-05 18:29 ` Laszlo Ersek
2013-12-06 12:03 ` Paolo Bonzini
2013-12-06 13:31 ` Paolo Bonzini
2013-12-06 13:46 ` Yao, Jiewen
2013-12-06 14:29 ` Paolo Bonzini
2013-12-06 14:47 ` Yao, Jiewen
2013-12-06 14:51 ` Paolo Bonzini
2013-12-06 13:31 ` Yao, Jiewen
2013-12-08 17:43 ` Laszlo Ersek
2013-12-08 22:15 ` Laszlo Ersek
2013-12-05 22:38 ` Laszlo Ersek
2013-12-05 22:53 ` Andrew Fish
2013-12-07 16:25 ` David Woodhouse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox