* [GIT PULL] x86 setup: correct booting on 486DX4
@ 2007-11-04 22:57 H. Peter Anvin
2007-11-05 0:14 ` Eric W. Biederman
0 siblings, 1 reply; 12+ messages in thread
From: H. Peter Anvin @ 2007-11-04 22:57 UTC (permalink / raw)
To: Linus Torvalds
Cc: Linux Kernel Mailing List, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, Mikael Petterson, Eric Biederman
Hi Linus; please pull:
git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-x86setup.git for-linus
H. Peter Anvin (1):
x86 setup: correct booting on 486DX4
arch/x86/boot/pmjump.S | 32 +++++++++++++++++++++-----------
1 files changed, 21 insertions(+), 11 deletions(-)
[Full diff and log follows]
commit ac3b37b78c5f0f0be0b476a35370650f7bad482f
Author: H. Peter Anvin <hpa@zytor.com>
Date: Sun Nov 4 14:33:41 2007 -0800
x86 setup: correct booting on 486DX4
Apparently, the 486DX4 does not correctly serialize a mov to %cr0, so
we really do need the far jump immediately afterwards. This means
losing the nice separation between 16- and 32-bit code, but c'est la
vie.
Also pass %ebx = %edi = %ebp = 0 to support future extension of the
32-bit boot protocol.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
diff --git a/arch/x86/boot/pmjump.S b/arch/x86/boot/pmjump.S
index 2e55923..17e6dec 100644
--- a/arch/x86/boot/pmjump.S
+++ b/arch/x86/boot/pmjump.S
@@ -28,27 +28,37 @@
* void protected_mode_jump(u32 entrypoint, u32 bootparams);
*/
protected_mode_jump:
- xorl %ebx, %ebx # Flag to indicate this is a boot
movl %edx, %esi # Pointer to boot_params table
- movl %eax, 2f # Patch ljmpl instruction
+
+ xorl %edx, %edx
+ movw %cs, %dx
+ shll $4, %edx # Patch ljmpl instruction
+ addl %edx, 2f
jmp 1f # Short jump to flush instruction q.
1:
movw $__BOOT_DS, %cx
+ xorl %ebx, %ebx # Per protocol
+ xorl %ebp, %ebp # Per protocol
+ xorl %edi, %edi # Per protocol
movl %cr0, %edx
orb $1, %dl # Protected mode (PE) bit
movl %edx, %cr0
+
+ .byte 0x66, 0xea # ljmpl opcode
+2: .long 3f # Offset
+ .word __BOOT_CS # Segment
- movw %cx, %ds
- movw %cx, %es
- movw %cx, %fs
- movw %cx, %gs
- movw %cx, %ss
+ .code32
+3:
+ movl %ecx, %ds
+ movl %ecx, %es
+ movl %ecx, %fs
+ movl %ecx, %gs
+ movl %ecx, %ss
# Jump to the 32-bit entrypoint
- .byte 0x66, 0xea # ljmpl opcode
-2: .long 0 # offset
- .word __BOOT_CS # segment
-
+ jmpl *%eax
+
.size protected_mode_jump, .-protected_mode_jump
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [GIT PULL] x86 setup: correct booting on 486DX4
[not found] <200711042259.lA4Mxa0n025210@tazenda.hos.anvin.org>
@ 2007-11-04 23:17 ` Linus Torvalds
2007-11-04 23:25 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Linus Torvalds @ 2007-11-04 23:17 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar,
Mikael Petterson, Eric Biederman
On Sun, 4 Nov 2007, H. Peter Anvin wrote:
>
> Apparently, the 486DX4 does not correctly serialize a mov to %cr0, so
> we really do need the far jump immediately afterwards.
Hmm. I'm not sure I agree with the commit message.
This is documented behaviour on i386 and i486: instruction decoding is
decoupled from execution, so things that change processor mode have to do
a jump to make sure that %cr0 changes take effect.
I'm not entirely sure that it needs to be a long-jump, btw. I think any
regular branch is sufficient. You obviously *do* need to make the long
jump later (to reload %cs in protected mode), but I'm not sure it's needed
in that place. I forget the exact rules (but they definitely were
documented).
Linus
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] x86 setup: correct booting on 486DX4
2007-11-04 23:17 ` [GIT PULL] x86 setup: correct booting on 486DX4 Linus Torvalds
@ 2007-11-04 23:25 ` Linus Torvalds
2007-11-04 23:36 ` H. Peter Anvin
2007-11-04 23:26 ` H. Peter Anvin
2007-11-04 23:27 ` Jeremy Fitzhardinge
2 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2007-11-04 23:25 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar,
Mikael Petterson, Eric Biederman
On Sun, 4 Nov 2007, Linus Torvalds wrote:
>
> I'm not entirely sure that it needs to be a long-jump, btw. I think any
> regular branch is sufficient. You obviously *do* need to make the long
> jump later (to reload %cs in protected mode), but I'm not sure it's needed
> in that place. I forget the exact rules (but they definitely were
> documented).
Hmm. The original Linux code did
movw $1, %ax
lmsw %ax
jmp flush_instr
flush_instr:
and I think that was straigh out of the documentation. So yeah, I think
that's the right fix - not a longjmp (which in itself is dangerous: it
potentially behaves *differently* on different CPU's, since some CPU's may
do the long jump with pre-protected-mode semantics, while others will do
it with protected mode already in effect!)
Linus
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] x86 setup: correct booting on 486DX4
2007-11-04 23:17 ` [GIT PULL] x86 setup: correct booting on 486DX4 Linus Torvalds
2007-11-04 23:25 ` Linus Torvalds
@ 2007-11-04 23:26 ` H. Peter Anvin
2007-11-04 23:59 ` Linus Torvalds
2007-11-04 23:27 ` Jeremy Fitzhardinge
2 siblings, 1 reply; 12+ messages in thread
From: H. Peter Anvin @ 2007-11-04 23:26 UTC (permalink / raw)
To: Linus Torvalds
Cc: Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar,
Mikael Petterson, Eric Biederman
Linus Torvalds wrote:
>
> On Sun, 4 Nov 2007, H. Peter Anvin wrote:
>>
>> Apparently, the 486DX4 does not correctly serialize a mov to %cr0, so
>> we really do need the far jump immediately afterwards.
>
> Hmm. I'm not sure I agree with the commit message.
>
> This is documented behaviour on i386 and i486: instruction decoding is
> decoupled from execution, so things that change processor mode have to do
> a jump to make sure that %cr0 changes take effect.
>
It's not an instruction-decoding issue at all (that's a 16- vs 32-bit
issue, which can only be changed by a ljmp). Apparently the 486DX4
mis-executes the load to segment register, which is an EU function in
that context. (And yes, it's sort-of-documented behaviour in the sense
that the documentation says "do things this way", but the Intel docs are
unfortunately full of "do things this way" which don't make sense and
occasionally are actively harmful, too.)
> I'm not entirely sure that it needs to be a long-jump, btw. I think any
> regular branch is sufficient. You obviously *do* need to make the long
> jump later (to reload %cs in protected mode), but I'm not sure it's needed
> in that place. I forget the exact rules (but they definitely were
> documented).
That's exactly the issue here. The code without this patch deferred the
long jump until after the segment loads, this worked on all processors
except, apparently, the 486DX4. Hence, move the ljmp up to the earliest
possible location.
-hpa
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] x86 setup: correct booting on 486DX4
2007-11-04 23:17 ` [GIT PULL] x86 setup: correct booting on 486DX4 Linus Torvalds
2007-11-04 23:25 ` Linus Torvalds
2007-11-04 23:26 ` H. Peter Anvin
@ 2007-11-04 23:27 ` Jeremy Fitzhardinge
2 siblings, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2007-11-04 23:27 UTC (permalink / raw)
To: Linus Torvalds
Cc: H. Peter Anvin, Linux Kernel Mailing List, Thomas Gleixner,
Ingo Molnar, Mikael Petterson, Eric Biederman
Linus Torvalds wrote:
> I'm not entirely sure that it needs to be a long-jump, btw. I think any
> regular branch is sufficient. You obviously *do* need to make the long
> jump later (to reload %cs in protected mode), but I'm not sure it's needed
> in that place. I forget the exact rules (but they definitely were
> documented).
Yes, it says it needs to be a far jmp or call (and if you enabled paging
in the cr0 load, you need to identity-map the branch target). Having
successfully broken the rules for a long time so far, maybe we can get
away with still cutting corners... but it doesn't seem particularly
worthwhile since we've been caught once.
J
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] x86 setup: correct booting on 486DX4
2007-11-04 23:25 ` Linus Torvalds
@ 2007-11-04 23:36 ` H. Peter Anvin
0 siblings, 0 replies; 12+ messages in thread
From: H. Peter Anvin @ 2007-11-04 23:36 UTC (permalink / raw)
To: Linus Torvalds
Cc: Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar,
Mikael Petterson, Eric Biederman
Linus Torvalds wrote:
>
> On Sun, 4 Nov 2007, Linus Torvalds wrote:
>> I'm not entirely sure that it needs to be a long-jump, btw. I think any
>> regular branch is sufficient. You obviously *do* need to make the long
>> jump later (to reload %cs in protected mode), but I'm not sure it's needed
>> in that place. I forget the exact rules (but they definitely were
>> documented).
>
> Hmm. The original Linux code did
>
> movw $1, %ax
> lmsw %ax
> jmp flush_instr
> flush_instr:
>
> and I think that was straigh out of the documentation. So yeah, I think
> that's the right fix - not a longjmp (which in itself is dangerous: it
> potentially behaves *differently* on different CPU's, since some CPU's may
> do the long jump with pre-protected-mode semantics, while others will do
> it with protected mode already in effect!)
>
Just looked it up; it was a bit hard to find (it is Intel vol 3 page
9-27, at least in the version I have), but you're right -- the
documentation only demands a short jump here, not a long jmp (which
actually makes sense given what I remembered that a long jump should be
deferrable here.) So yes, that is definitely the right fix and avoids
the ugly mixing of code.
I'll update the patch.
-hpa
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] x86 setup: correct booting on 486DX4
2007-11-04 23:26 ` H. Peter Anvin
@ 2007-11-04 23:59 ` Linus Torvalds
2007-11-05 0:02 ` H. Peter Anvin
0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2007-11-04 23:59 UTC (permalink / raw)
To: H. Peter Anvin, Jeremy Fitzhardinge
Cc: Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar,
Mikael Petterson, Eric Biederman
On Sun, 4 Nov 2007, H. Peter Anvin wrote:
>
> It's not an instruction-decoding issue at all (that's a 16- vs 32-bit issue,
> which can only be changed by a ljmp). Apparently the 486DX4 mis-executes the
> load to segment register, which is an EU function in that context. (And yes,
> it's sort-of-documented behaviour in the sense that the documentation says "do
> things this way", but the Intel docs are unfortunately full of "do things this
> way" which don't make sense and occasionally are actively harmful, too.)
I still disagree.
I took out "Programming the 80386" just to check, and the documentation
very clearly states that when changing the CR0 bits (I quote):
"The program must execute a jump instruction immediately after
changing the value of the PE bit in order to flush the execution
pipeline of any instructions that may have been fetched in the
wrong mode. [...]"
In other words, not only is this documented since day 1, it makes total
sense, and they even said exactöy *why* that jump had to be done.
In fact, there's even a code example. It's page 624 in my copy of the
book, and yes, it has a short jump to flush things, followed by a long
jump. The code there looks like this:
; *****
; ** [4] Enter Protected Mode
; *****
SMSW AX
OR AX, PE
LMSW AX
JMP Flush
Flush:
JMP far ptr Start32
which is pretty damn conclusive. It's documented, it has examples, it
works. In other words, it's how you should do things.
And Linux always did it correctly. I don't understand why you disagree,
and why Jeremy says
"Having successfully broken the rules for a long time so far,
maybe we can get away with still cutting corners..."
when the fact is, we used to *not* cut corners, we used to *not* break the
rules, and what we used to do (a short jump immediately after setting PE)
was exactly what Intel always said you should do, and there is no question
what-so-ever about it.
So here's a suggestion:
- make the code do what it used to do. A regular jump to flush the
pipeline. Which is what Intel has always said should be done.
and I really don't see that there is any argument about this.
Linus
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] x86 setup: correct booting on 486DX4
2007-11-04 23:59 ` Linus Torvalds
@ 2007-11-05 0:02 ` H. Peter Anvin
2007-11-05 0:12 ` H. Peter Anvin
2007-11-05 0:43 ` Eric W. Biederman
0 siblings, 2 replies; 12+ messages in thread
From: H. Peter Anvin @ 2007-11-05 0:02 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jeremy Fitzhardinge, Linux Kernel Mailing List, Thomas Gleixner,
Ingo Molnar, Mikael Petterson, Eric Biederman
Linus Torvalds wrote:
>
> And Linux always did it correctly. I don't understand why you disagree,
> and why Jeremy says
>
> "Having successfully broken the rules for a long time so far,
> maybe we can get away with still cutting corners..."
>
> when the fact is, we used to *not* cut corners, we used to *not* break the
> rules, and what we used to do (a short jump immediately after setting PE)
> was exactly what Intel always said you should do, and there is no question
> what-so-ever about it.
>
Apparently because the Intel documentation disagrees with itself.
That's all.
-hpa
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] x86 setup: correct booting on 486DX4
2007-11-05 0:02 ` H. Peter Anvin
@ 2007-11-05 0:12 ` H. Peter Anvin
2007-11-05 0:43 ` Eric W. Biederman
1 sibling, 0 replies; 12+ messages in thread
From: H. Peter Anvin @ 2007-11-05 0:12 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jeremy Fitzhardinge, Linux Kernel Mailing List, Thomas Gleixner,
Ingo Molnar, Mikael Petterson, Eric Biederman
H. Peter Anvin wrote:
>
> Apparently because the Intel documentation disagrees with itself. That's
> all.
>
Just to be perfectly clear: I much prefer the code with the short (near)
jump, because it keeps the code cleaner. I have sent a patch to Mikael
to test out.
-hpa
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] x86 setup: correct booting on 486DX4
2007-11-04 22:57 H. Peter Anvin
@ 2007-11-05 0:14 ` Eric W. Biederman
0 siblings, 0 replies; 12+ messages in thread
From: Eric W. Biederman @ 2007-11-05 0:14 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Linus Torvalds, Linux Kernel Mailing List, Thomas Gleixner,
Ingo Molnar, Mikael Petterson
"H. Peter Anvin" <hpa@zytor.com> writes:
> Hi Linus; please pull:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-x86setup.git
> for-linus
>
> H. Peter Anvin (1):
> x86 setup: correct booting on 486DX4
>
> arch/x86/boot/pmjump.S | 32 +++++++++++++++++++++-----------
> 1 files changed, 21 insertions(+), 11 deletions(-)
>
> [Full diff and log follows]
Looks reasonable to me.
> commit ac3b37b78c5f0f0be0b476a35370650f7bad482f
> Author: H. Peter Anvin <hpa@zytor.com>
> Date: Sun Nov 4 14:33:41 2007 -0800
>
> x86 setup: correct booting on 486DX4
>
> Apparently, the 486DX4 does not correctly serialize a mov to %cr0, so
> we really do need the far jump immediately afterwards. This means
> losing the nice separation between 16- and 32-bit code, but c'est la
> vie.
>
> Also pass %ebx = %edi = %ebp = 0 to support future extension of the
> 32-bit boot protocol.
>
> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
>
> diff --git a/arch/x86/boot/pmjump.S b/arch/x86/boot/pmjump.S
> index 2e55923..17e6dec 100644
> --- a/arch/x86/boot/pmjump.S
> +++ b/arch/x86/boot/pmjump.S
> @@ -28,27 +28,37 @@
> * void protected_mode_jump(u32 entrypoint, u32 bootparams);
> */
> protected_mode_jump:
> - xorl %ebx, %ebx # Flag to indicate this is a boot
> movl %edx, %esi # Pointer to boot_params table
> - movl %eax, 2f # Patch ljmpl instruction
> +
> + xorl %edx, %edx
> + movw %cs, %dx
> + shll $4, %edx # Patch ljmpl instruction
> + addl %edx, 2f
> jmp 1f # Short jump to flush instruction q.
>
> 1:
> movw $__BOOT_DS, %cx
> + xorl %ebx, %ebx # Per protocol
> + xorl %ebp, %ebp # Per protocol
> + xorl %edi, %edi # Per protocol
>
> movl %cr0, %edx
> orb $1, %dl # Protected mode (PE) bit
> movl %edx, %cr0
> +
> + .byte 0x66, 0xea # ljmpl opcode
> +2: .long 3f # Offset
> + .word __BOOT_CS # Segment
>
> - movw %cx, %ds
> - movw %cx, %es
> - movw %cx, %fs
> - movw %cx, %gs
> - movw %cx, %ss
> + .code32
> +3:
> + movl %ecx, %ds
> + movl %ecx, %es
> + movl %ecx, %fs
> + movl %ecx, %gs
> + movl %ecx, %ss
>
> # Jump to the 32-bit entrypoint
> - .byte 0x66, 0xea # ljmpl opcode
> -2: .long 0 # offset
> - .word __BOOT_CS # segment
> -
> + jmpl *%eax
> +
> .size protected_mode_jump, .-protected_mode_jump
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] x86 setup: correct booting on 486DX4
2007-11-05 0:02 ` H. Peter Anvin
2007-11-05 0:12 ` H. Peter Anvin
@ 2007-11-05 0:43 ` Eric W. Biederman
2007-11-05 1:10 ` Linus Torvalds
1 sibling, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 2007-11-05 0:43 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Linus Torvalds, Jeremy Fitzhardinge, Linux Kernel Mailing List,
Thomas Gleixner, Ingo Molnar, Mikael Petterson
"H. Peter Anvin" <hpa@zytor.com> writes:
> Linus Torvalds wrote:
>>
>> And Linux always did it correctly. I don't understand why you disagree, and
>> why Jeremy says
>>
>> "Having successfully broken the rules for a long time so far, maybe
>> we can get away with still cutting corners..."
>>
>> when the fact is, we used to *not* cut corners, we used to *not* break the
>> rules, and what we used to do (a short jump immediately after setting PE) was
>> exactly what Intel always said you should do, and there is no question
>> what-so-ever about it.
>>
>
> Apparently because the Intel documentation disagrees with itself. That's all.
Yes. Let's go back to the tested version with the short jump, that
looks safest as it is what we have always done, and we certainly need some
kind of jump in there.
I do seem to recall etherboot having a far jump in that spot and it
working on everything from a 386 on up. So I'm not certain if the
kind of jump matters. Still the kernel has a lot more exposure.
At the same time it does look like we really do enter protected mode
with a valid gdt after the short jump so doing the segments loads as
I did originally in 32bit mode looks like it was excessively
conservative.
Eric
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] x86 setup: correct booting on 486DX4
2007-11-05 0:43 ` Eric W. Biederman
@ 2007-11-05 1:10 ` Linus Torvalds
0 siblings, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2007-11-05 1:10 UTC (permalink / raw)
To: Eric W. Biederman
Cc: H. Peter Anvin, Jeremy Fitzhardinge, Linux Kernel Mailing List,
Thomas Gleixner, Ingo Molnar, Mikael Petterson
On Sun, 4 Nov 2007, Eric W. Biederman wrote:
>
> I do seem to recall etherboot having a far jump in that spot and it
> working on everything from a 386 on up. So I'm not certain if the
> kind of jump matters. Still the kernel has a lot more exposure.
I actually suspect you could have just about anything in there, including
just a couple of nops, or just avoiding certain instructions for a few
cycles.
The i386/i486 pipeline isn't actually all that long (I ca't find it here,
but I want to say it was just five stages), and the whole/only issue with
writing to cr0 on those CPU's is literally that there isn't any forwarding
of the cr0 state, so any instruction that actually has depend on the cr0
value needs to have that value stable in the register by the time it
executes.
So I literally suspect that just a couple of no-ops in between the move to
cr0 and any instruction that depends on the state of the PE bit would be
ok. And there aren't that many instructions that do, it's generally just
the ones that load a segment that can care.
But I'd actually be worried about a ljmp directly after the "move to
cr0", exactly because an ljump actually does have semantic dependencies on
the PE bit. But it's quite likely that ljmp is microcoded (it takes 12+
cycles even in real mode), and since microcode was nonpipelined, that
would hide it.
But "move to segment" is definitely *not* microcoded in real mode (it's
documented as just two cycles for reg->seg), so I'm not at all surprised
that "mov->cr0" followed immediately by "mov->seg" will not work.
In short:
- far jumps are in the "dangerous instruction" category after a change to
PE. I would suggest not using it, although I also suspect that it
probably works if only because it's probably microcoded on at least an
i386.
- instead of a short taken jump, you can almost certainly use anything
that is microcoded or just otherwise takes enough cycles (where
"enough" is likely in the 5-10 range) to make sure the writeback to CR0
is stable by the time any instruction uses it.
- almost anything that doesn't actually involve a segment descriptor
lookup is probably not going to care at all about the value of PE. The
PE bit really doesn't affect all that much of the x86 instruction set,
and if an instruction doesn't care, it doesn't matter whether it's
executed with the old or the new value.
Linus
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-11-05 1:11 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200711042259.lA4Mxa0n025210@tazenda.hos.anvin.org>
2007-11-04 23:17 ` [GIT PULL] x86 setup: correct booting on 486DX4 Linus Torvalds
2007-11-04 23:25 ` Linus Torvalds
2007-11-04 23:36 ` H. Peter Anvin
2007-11-04 23:26 ` H. Peter Anvin
2007-11-04 23:59 ` Linus Torvalds
2007-11-05 0:02 ` H. Peter Anvin
2007-11-05 0:12 ` H. Peter Anvin
2007-11-05 0:43 ` Eric W. Biederman
2007-11-05 1:10 ` Linus Torvalds
2007-11-04 23:27 ` Jeremy Fitzhardinge
2007-11-04 22:57 H. Peter Anvin
2007-11-05 0:14 ` Eric W. Biederman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox