* More E820 brokenness
@ 2007-09-27 22:17 H. Peter Anvin
2007-09-27 22:33 ` Jordan Crouse
0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2007-09-27 22:17 UTC (permalink / raw)
To: jkeating, Joerg Pommnitz, Jordan Crouse, Chuck Ebbert
Cc: Linux Kernel Mailing List, Andi Kleen
[-- Attachment #1: Type: text/plain, Size: 1027 bytes --]
As luck would have it, it's not just an obscure Geode system which has a
broken E820 implementation. Today I received a bug report about a Dell
system (XPS M1330) with broken E820.
Unfortunately, the workaround for the Geode breaks this system, because
x86-64 doesn't fall back to the e801/88 information like the i386 kernel
does.
I wonder if the relevant people could test out this patch to see how it
works on their respective system. This patch reverts to 2.6.23-rc8
behaviour of simply truncating the map, but still makes e801/88 info
available to the kernel; this hopefully should match 2.6.22 behaviour.
I want to emphasize that this is seriously broken. Using a partial e820
map could have disastrous results, since the kernel will have partial
memory map information and not know about reserved areas, etc. Part of
me feels that the right thing to do is what the current git kernel does
-- either fall back to e801, or stop and error.
(Andi: I would particularly appreciate your opinion on this issue.)
-hpa
[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 642 bytes --]
diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index bccaa1c..84939b7 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -34,17 +34,7 @@ static int detect_memory_e820(void)
"=m" (*desc)
: "D" (desc), "a" (0xe820));
- /* Some BIOSes stop returning SMAP in the middle of
- the search loop. We don't know exactly how the BIOS
- screwed up the map at that point, we might have a
- partial map, the full map, or complete garbage, so
- just return failure. */
- if (id != SMAP) {
- count = 0;
- break;
- }
-
- if (err)
+ if (id != SMAP || err)
break;
count++;
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: More E820 brokenness
2007-09-27 22:17 More E820 brokenness H. Peter Anvin
@ 2007-09-27 22:33 ` Jordan Crouse
2007-09-27 22:47 ` H. Peter Anvin
0 siblings, 1 reply; 11+ messages in thread
From: Jordan Crouse @ 2007-09-27 22:33 UTC (permalink / raw)
To: H. Peter Anvin
Cc: jkeating, Joerg Pommnitz, Chuck Ebbert, Linux Kernel Mailing List,
Andi Kleen
On 27/09/07 15:17 -0700, H. Peter Anvin wrote:
> As luck would have it, it's not just an obscure Geode system which has a
> broken E820 implementation. Today I received a bug report about a Dell
> system (XPS M1330) with broken E820.
>
> Unfortunately, the workaround for the Geode breaks this system, because
> x86-64 doesn't fall back to the e801/88 information like the i386 kernel
> does.
>
> I wonder if the relevant people could test out this patch to see how it
> works on their respective system. This patch reverts to 2.6.23-rc8
> behaviour of simply truncating the map, but still makes e801/88 info
> available to the kernel; this hopefully should match 2.6.22 behaviour.
Breaks on the Geode - original behavior.
I think that having boot_prams.e820_entries != 0 makes the kernel
assume the e820 data is correct.
> I want to emphasize that this is seriously broken. Using a partial e820
> map could have disastrous results, since the kernel will have partial
> memory map information and not know about reserved areas, etc. Part of
> me feels that the right thing to do is what the current git kernel does
> -- either fall back to e801, or stop and error.
I'm inclined to agree.
Jordan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: More E820 brokenness
2007-09-27 22:33 ` Jordan Crouse
@ 2007-09-27 22:47 ` H. Peter Anvin
2007-09-27 23:15 ` Jordan Crouse
0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2007-09-27 22:47 UTC (permalink / raw)
To: Jordan Crouse
Cc: jkeating, Joerg Pommnitz, Chuck Ebbert, Linux Kernel Mailing List,
Andi Kleen
Jordan Crouse wrote:
>
> Breaks on the Geode - original behavior.
>
> I think that having boot_prams.e820_entries != 0 makes the kernel
> assume the e820 data is correct.
>
Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
because this, to the best of my reading, mimics the 2.6.22 behavior
exactly. DID IT REALLY, and/or did you make any kind of configuration
changes?
>> I want to emphasize that this is seriously broken. Using a partial e820
>> map could have disastrous results, since the kernel will have partial
>> memory map information and not know about reserved areas, etc. Part of
>> me feels that the right thing to do is what the current git kernel does
>> -- either fall back to e801, or stop and error.
>
> I'm inclined to agree.
Arguably the right thing to do is to find the responsible BIOS engineer
and shoot them, but that's hard to do without robotics.
-hpa
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: More E820 brokenness
2007-09-27 22:47 ` H. Peter Anvin
@ 2007-09-27 23:15 ` Jordan Crouse
2007-09-27 23:22 ` H. Peter Anvin
2007-09-27 23:27 ` H. Peter Anvin
0 siblings, 2 replies; 11+ messages in thread
From: Jordan Crouse @ 2007-09-27 23:15 UTC (permalink / raw)
To: H. Peter Anvin
Cc: jkeating, Joerg Pommnitz, Chuck Ebbert, Linux Kernel Mailing List,
Andi Kleen
On 27/09/07 15:47 -0700, H. Peter Anvin wrote:
> Jordan Crouse wrote:
> >
> > Breaks on the Geode - original behavior.
> >
> > I think that having boot_prams.e820_entries != 0 makes the kernel
> > assume the e820 data is correct.
> >
>
> Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
> because this, to the best of my reading, mimics the 2.6.22 behavior
> exactly. DID IT REALLY, and/or did you make any kind of configuration
> changes?
I copied in a 2.6.22 kernel to see that it really did work, and it did.
But here's the crazy part - I did a dmesg, and it looks like it
*is* using e820 data, and it looks complete (I see the entire map -
including the ACPI and reserved blocks way up high).
So apparently it was the 2.6.22 code that was buggy, but reading it,
I don't immediately see how.
Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: More E820 brokenness
2007-09-27 23:15 ` Jordan Crouse
@ 2007-09-27 23:22 ` H. Peter Anvin
2007-09-27 23:27 ` H. Peter Anvin
1 sibling, 0 replies; 11+ messages in thread
From: H. Peter Anvin @ 2007-09-27 23:22 UTC (permalink / raw)
To: Jordan Crouse
Cc: jkeating, Joerg Pommnitz, Chuck Ebbert, Linux Kernel Mailing List,
Andi Kleen
Jordan Crouse wrote:
>
> I copied in a 2.6.22 kernel to see that it really did work, and it did.
> But here's the crazy part - I did a dmesg, and it looks like it
> *is* using e820 data, and it looks complete (I see the entire map -
> including the ACPI and reserved blocks way up high).
>
> So apparently it was the 2.6.22 code that was buggy, but reading it,
> I don't immediately see how.
>
Was this a stock 2.6.22 kernel, or might it have been modified?
There is, of course, also the possibility that triggering the BIOS bug
in your case depends on some delicate combination of input state.
-hpa
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: More E820 brokenness
2007-09-27 23:15 ` Jordan Crouse
2007-09-27 23:22 ` H. Peter Anvin
@ 2007-09-27 23:27 ` H. Peter Anvin
2007-09-27 23:34 ` Jordan Crouse
1 sibling, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2007-09-27 23:27 UTC (permalink / raw)
To: Jordan Crouse
Cc: jkeating, Joerg Pommnitz, Chuck Ebbert, Linux Kernel Mailing List,
Andi Kleen
[-- Attachment #1: Type: text/plain, Size: 1048 bytes --]
Jordan Crouse wrote:
> On 27/09/07 15:47 -0700, H. Peter Anvin wrote:
>> Jordan Crouse wrote:
>>> Breaks on the Geode - original behavior.
>>>
>>> I think that having boot_prams.e820_entries != 0 makes the kernel
>>> assume the e820 data is correct.
>>>
>> Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
>> because this, to the best of my reading, mimics the 2.6.22 behavior
>> exactly. DID IT REALLY, and/or did you make any kind of configuration
>> changes?
>
> I copied in a 2.6.22 kernel to see that it really did work, and it did.
> But here's the crazy part - I did a dmesg, and it looks like it
> *is* using e820 data, and it looks complete (I see the entire map -
> including the ACPI and reserved blocks way up high).
>
> So apparently it was the 2.6.22 code that was buggy, but reading it,
> I don't immediately see how.
>
Oh bugger, looks like this one might be genuinely my fault after all.
The ID check in the new code is buggy.
Can you please test this revised patch out (against current -git)?
-hpa
[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 642 bytes --]
diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index bccaa1c..84939b7 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -34,17 +34,7 @@ static int detect_memory_e820(void)
"=m" (*desc)
: "D" (desc), "a" (0xe820));
- /* Some BIOSes stop returning SMAP in the middle of
- the search loop. We don't know exactly how the BIOS
- screwed up the map at that point, we might have a
- partial map, the full map, or complete garbage, so
- just return failure. */
- if (id != SMAP) {
- count = 0;
- break;
- }
-
- if (err)
+ if (id != SMAP || err)
break;
count++;
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: More E820 brokenness
2007-09-27 23:27 ` H. Peter Anvin
@ 2007-09-27 23:34 ` Jordan Crouse
2007-09-27 23:36 ` H. Peter Anvin
0 siblings, 1 reply; 11+ messages in thread
From: Jordan Crouse @ 2007-09-27 23:34 UTC (permalink / raw)
To: H. Peter Anvin
Cc: jkeating, Joerg Pommnitz, Chuck Ebbert, Linux Kernel Mailing List,
Andi Kleen
On 27/09/07 16:27 -0700, H. Peter Anvin wrote:
> Jordan Crouse wrote:
> > On 27/09/07 15:47 -0700, H. Peter Anvin wrote:
> >> Jordan Crouse wrote:
> >>> Breaks on the Geode - original behavior.
> >>>
> >>> I think that having boot_prams.e820_entries != 0 makes the kernel
> >>> assume the e820 data is correct.
> >>>
> >> Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
> >> because this, to the best of my reading, mimics the 2.6.22 behavior
> >> exactly. DID IT REALLY, and/or did you make any kind of configuration
> >> changes?
> >
> > I copied in a 2.6.22 kernel to see that it really did work, and it did.
> > But here's the crazy part - I did a dmesg, and it looks like it
> > *is* using e820 data, and it looks complete (I see the entire map -
> > including the ACPI and reserved blocks way up high).
> >
> > So apparently it was the 2.6.22 code that was buggy, but reading it,
> > I don't immediately see how.
> >
>
> Oh bugger, looks like this one might be genuinely my fault after all.
> The ID check in the new code is buggy.
>
> Can you please test this revised patch out (against current -git)?
> -hpa
>
>
>
> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
> index bccaa1c..84939b7 100644
> --- a/arch/i386/boot/memory.c
> +++ b/arch/i386/boot/memory.c
> @@ -34,17 +34,7 @@ static int detect_memory_e820(void)
> "=m" (*desc)
> : "D" (desc), "a" (0xe820));
>
> - /* Some BIOSes stop returning SMAP in the middle of
> - the search loop. We don't know exactly how the BIOS
> - screwed up the map at that point, we might have a
> - partial map, the full map, or complete garbage, so
> - just return failure. */
> - if (id != SMAP) {
> - count = 0;
> - break;
> - }
> -
> - if (err)
> + if (id != SMAP || err)
> break;
>
> count++;
That looks the same as the previous patch you sent?
Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: More E820 brokenness
2007-09-27 23:34 ` Jordan Crouse
@ 2007-09-27 23:36 ` H. Peter Anvin
2007-09-27 23:54 ` Jordan Crouse
0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2007-09-27 23:36 UTC (permalink / raw)
To: Jordan Crouse
Cc: jkeating, Joerg Pommnitz, Chuck Ebbert, Linux Kernel Mailing List,
Andi Kleen
[-- Attachment #1: Type: text/plain, Size: 316 bytes --]
Jordan Crouse wrote:
>>>
>> Oh bugger, looks like this one might be genuinely my fault after all.
>> The ID check in the new code is buggy.
>>
>> Can you please test this revised patch out (against current -git)?
>
>
> That looks the same as the previous patch you sent?
>
Sorry, this is the right one...
-hpa
[-- Attachment #2: smap.patch --]
[-- Type: text/x-patch, Size: 636 bytes --]
diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index bccaa1c..2f37568 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -28,11 +28,10 @@ static int detect_memory_e820(void)
do {
size = sizeof(struct e820entry);
- id = SMAP;
asm("int $0x15; setc %0"
- : "=am" (err), "+b" (next), "+d" (id), "+c" (size),
+ : "=dm" (err), "+b" (next), "=a" (id), "+c" (size),
"=m" (*desc)
- : "D" (desc), "a" (0xe820));
+ : "D" (desc), "d" (SMAP), "a" (0xe820));
/* Some BIOSes stop returning SMAP in the middle of
the search loop. We don't know exactly how the BIOS
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: More E820 brokenness
2007-09-27 23:36 ` H. Peter Anvin
@ 2007-09-27 23:54 ` Jordan Crouse
2007-09-28 0:12 ` H. Peter Anvin
0 siblings, 1 reply; 11+ messages in thread
From: Jordan Crouse @ 2007-09-27 23:54 UTC (permalink / raw)
To: H. Peter Anvin
Cc: jkeating, Joerg Pommnitz, Chuck Ebbert, Linux Kernel Mailing List,
Andi Kleen
On 27/09/07 16:36 -0700, H. Peter Anvin wrote:
> Jordan Crouse wrote:
> >>>
> >> Oh bugger, looks like this one might be genuinely my fault after all.
> >> The ID check in the new code is buggy.
> >>
> >> Can you please test this revised patch out (against current -git)?
> >
> >
> > That looks the same as the previous patch you sent?
> >
>
> Sorry, this is the right one...
>
> -hpa
> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
> index bccaa1c..2f37568 100644
> --- a/arch/i386/boot/memory.c
> +++ b/arch/i386/boot/memory.c
> @@ -28,11 +28,10 @@ static int detect_memory_e820(void)
>
> do {
> size = sizeof(struct e820entry);
> - id = SMAP;
> asm("int $0x15; setc %0"
> - : "=am" (err), "+b" (next), "+d" (id), "+c" (size),
> + : "=dm" (err), "+b" (next), "=a" (id), "+c" (size),
> "=m" (*desc)
> - : "D" (desc), "a" (0xe820));
> + : "D" (desc), "d" (SMAP), "a" (0xe820));
>
> /* Some BIOSes stop returning SMAP in the middle of
> the search loop. We don't know exactly how the BIOS
Worked, but that just raises more questions. Why didn't more x86 boxes
break or, alternatively, why did a new version of the BIOS fix the problem?
I guess we shouldn't look a gift horse in the mouth. Or something.
Thanks very much for your help.
Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: More E820 brokenness
2007-09-27 23:54 ` Jordan Crouse
@ 2007-09-28 0:12 ` H. Peter Anvin
2007-09-28 13:05 ` Rafael J. Wysocki
0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2007-09-28 0:12 UTC (permalink / raw)
To: Jordan Crouse
Cc: jkeating, Joerg Pommnitz, Chuck Ebbert, Linux Kernel Mailing List,
Andi Kleen
Jordan Crouse wrote:
>
> Worked, but that just raises more questions. Why didn't more x86 boxes
> break or, alternatively, why did a new version of the BIOS fix the problem?
> I guess we shouldn't look a gift horse in the mouth. Or something.
>
Why didn't more x86 boxes break... well, it's pretty natural an
implementation of the BIOS to not clobber registers that aren't outputs.
Arguably the BIOSes that do are still buggy, since there isn't a
well-defined calling sequence for the BIOS and the convention that has
evolved is "don't clobber anything unless it's an output."
It's still wrong, however, especially since it means omitting the *real*
SMAP check.
-hpa
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: More E820 brokenness
2007-09-28 0:12 ` H. Peter Anvin
@ 2007-09-28 13:05 ` Rafael J. Wysocki
0 siblings, 0 replies; 11+ messages in thread
From: Rafael J. Wysocki @ 2007-09-28 13:05 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Jordan Crouse, jkeating, Joerg Pommnitz, Chuck Ebbert,
Linux Kernel Mailing List, Andi Kleen
On Friday, 28 September 2007 02:12, H. Peter Anvin wrote:
> Jordan Crouse wrote:
> >
> > Worked, but that just raises more questions. Why didn't more x86 boxes
> > break or, alternatively, why did a new version of the BIOS fix the problem?
> > I guess we shouldn't look a gift horse in the mouth. Or something.
> >
>
> Why didn't more x86 boxes break... well, it's pretty natural an
> implementation of the BIOS to not clobber registers that aren't outputs.
> Arguably the BIOSes that do are still buggy, since there isn't a
> well-defined calling sequence for the BIOS and the convention that has
> evolved is "don't clobber anything unless it's an output."
>
> It's still wrong, however, especially since it means omitting the *real*
> SMAP check.
I'd like to update http://bugzilla.kernel.org/show_bug.cgi?id=9086 with correct
information.
Should I add a pointer to the patch from your previous message to it?
Greetings,
Rafael
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-09-28 12:50 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-27 22:17 More E820 brokenness H. Peter Anvin
2007-09-27 22:33 ` Jordan Crouse
2007-09-27 22:47 ` H. Peter Anvin
2007-09-27 23:15 ` Jordan Crouse
2007-09-27 23:22 ` H. Peter Anvin
2007-09-27 23:27 ` H. Peter Anvin
2007-09-27 23:34 ` Jordan Crouse
2007-09-27 23:36 ` H. Peter Anvin
2007-09-27 23:54 ` Jordan Crouse
2007-09-28 0:12 ` H. Peter Anvin
2007-09-28 13:05 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox