* Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
@ 2007-09-26 10:56 Joerg Pommnitz
2007-09-26 14:10 ` H. Peter Anvin
0 siblings, 1 reply; 11+ messages in thread
From: Joerg Pommnitz @ 2007-09-26 10:56 UTC (permalink / raw)
To: Jordan Crouse; +Cc: cebbert, linux-kernel, hpa
Hello all,
this is what git bisect told me about the problem:
jpo@jpo-laptop:~/linux-2.6$ git bisect good
4fd06960f120e02e9abc802a09f9511c400042a5 is first bad commit
commit 4fd06960f120e02e9abc802a09f9511c400042a5
Author: H. Peter Anvin <hpa@zytor.com>
Date: Wed Jul 11 12:18:56 2007 -0700
Use the new x86 setup code for i386
This patch hooks the new x86 setup code into the Makefile machinery. It
also adapts boot/tools/build.c to a two-file (as opposed to three-file)
universe, and simplifies it substantially.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
:040000 040000 6560eb5b7e40d93813276544bced8c478f9067f5 fe5f90d9ca08e526559815789175602ba2c51743 M arch
--
Regards
Joerg
----- Ursprüngliche Mail ----
Von: Jordan Crouse <jordan.crouse@amd.com>
An: Joerg Pommnitz <pommnitz@yahoo.com>
CC: cebbert@redhat.com; linux-kernel@vger.kernel.org
Gesendet: Dienstag, den 25. September 2007, 17:04:52 Uhr
Betreff: Re: Problems with 2.6.23-rc6 on AMD Geode LX800
On 25/09/07 01:38 -0700, Joerg Pommnitz wrote:
> Chuck, Jordan,
> thanks for taking an interest in this problem. As suggested by Jordan I tried a new
> BIOS revision from
> http://www.digitallogic.ch/index.php?id=256&dir=/MSEP800%20-%20SM800PCX%20%20-%20MPC20%20-%20MPC21&mountpoint=23
>
> Unfortunately the kernel still fails to boot in the same way.
You'll have to contact Digital Logic and have them check with the BIOS vendor
to see if the fix was made in that version or not. I don't have that
particular board, so I can't try out the fixes here.
I'm still trying to track down the particulars of the fix from the BIOS
vendor. I'll let you know.
> Do you still need the disassembled reserve_bootmem_core?
Sure - you might as well - just to make sure its the same problem.
Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.
__________________________________
Yahoo! Clever: Sie haben Fragen? Yahoo! Nutzer antworten Ihnen. www.yahoo.de/clever
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
2007-09-26 10:56 Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800 Joerg Pommnitz
@ 2007-09-26 14:10 ` H. Peter Anvin
2007-09-26 15:41 ` Jordan Crouse
0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2007-09-26 14:10 UTC (permalink / raw)
To: Joerg Pommnitz; +Cc: Jordan Crouse, cebbert, linux-kernel
Joerg Pommnitz wrote:
> Hello all,
> this is what git bisect told me about the problem:
>
> jpo@jpo-laptop:~/linux-2.6$ git bisect good
> 4fd06960f120e02e9abc802a09f9511c400042a5 is first bad commit
> commit 4fd06960f120e02e9abc802a09f9511c400042a5
> Author: H. Peter Anvin <hpa@zytor.com>
> Date: Wed Jul 11 12:18:56 2007 -0700
>
> Use the new x86 setup code for i386
>
> This patch hooks the new x86 setup code into the Makefile machinery. It
> also adapts boot/tools/build.c to a two-file (as opposed to three-file)
> universe, and simplifies it substantially.
>
> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>
> :040000 040000 6560eb5b7e40d93813276544bced8c478f9067f5 fe5f90d9ca08e526559815789175602ba2c51743 M arch
>
There is something very fishy.
The only documentation you've given us so far is a screen shot which
contained a message ("BIOS data check successful") which doesn't occur
in the kernel.
The loader string doesn't look all that familiar either; it looks like
an extremely old version of SYSLINUX, but that doesn't contain that
message either.
INT 6 is #UD, the undefined instruction exception. This is consistent with:
> Its hitting a bug - specifically (from bootmem.c:125):
> BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);
However, all that tells us is that reserve_bootmem_core() was either
called with a bad address or bdata->node_low_pfn is garbage. In
particular, without knowing how it got there it's hard to know for sure.
Could you send me the boot messages from a working kernel boot?
-hpa
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
2007-09-26 14:10 ` H. Peter Anvin
@ 2007-09-26 15:41 ` Jordan Crouse
2007-09-26 16:57 ` H. Peter Anvin
2007-09-26 19:14 ` H. Peter Anvin
0 siblings, 2 replies; 11+ messages in thread
From: Jordan Crouse @ 2007-09-26 15:41 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Joerg Pommnitz, cebbert, linux-kernel
On 26/09/07 07:10 -0700, H. Peter Anvin wrote:
> Joerg Pommnitz wrote:
> > Hello all,
> > this is what git bisect told me about the problem:
> >
> > jpo@jpo-laptop:~/linux-2.6$ git bisect good
> > 4fd06960f120e02e9abc802a09f9511c400042a5 is first bad commit
> > commit 4fd06960f120e02e9abc802a09f9511c400042a5
> > Author: H. Peter Anvin <hpa@zytor.com>
> > Date: Wed Jul 11 12:18:56 2007 -0700
> >
> > Use the new x86 setup code for i386
> >
> > This patch hooks the new x86 setup code into the Makefile machinery. It
> > also adapts boot/tools/build.c to a two-file (as opposed to three-file)
> > universe, and simplifies it substantially.
> >
> > Signed-off-by: H. Peter Anvin <hpa@zytor.com>
> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> >
> > :040000 040000 6560eb5b7e40d93813276544bced8c478f9067f5 fe5f90d9ca08e526559815789175602ba2c51743 M arch
> >
>
> There is something very fishy.
>
> The only documentation you've given us so far is a screen shot which
> contained a message ("BIOS data check successful") which doesn't occur
> in the kernel.
>
> The loader string doesn't look all that familiar either; it looks like
> an extremely old version of SYSLINUX, but that doesn't contain that
> message either.
>
> INT 6 is #UD, the undefined instruction exception. This is consistent with:
>
> > Its hitting a bug - specifically (from bootmem.c:125):
> > BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);
>
> However, all that tells us is that reserve_bootmem_core() was either
> called with a bad address or bdata->node_low_pfn is garbage. In
> particular, without knowing how it got there it's hard to know for sure.
/me swings a +5 JTAG debugger
Its the latter - max_pfn as read by find_max_pfn() in arch/i386/e820.c
is being set to 9F (640k) in the broken case, this due to the
the e820 map looking something like this:
Address Size Type
00000000 0009FC00 1
0009FC00 00000400 2
000E0000 00002000 2
(Yep, thats it - thats the list. e820.nr_map is indeed 3).
Long story short, bdata->node_low_pfn gets set to 9F, and When we
try to allocate the bootmem bitmap (at _pa_symbol(_text), which is
page 0x100), then the system gets appropriately angry.
As background, I'm using syslinux 3.36 as my loader here - I've used this
exact same version for a very long time, so I don't blame it in the least.
Something is getting confused in the early kernel, and whatever that
something is, a still unknown change in a newer version of the BIOS
fixed it. The search goes on.
Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
2007-09-26 15:41 ` Jordan Crouse
@ 2007-09-26 16:57 ` H. Peter Anvin
2007-09-26 19:14 ` H. Peter Anvin
1 sibling, 0 replies; 11+ messages in thread
From: H. Peter Anvin @ 2007-09-26 16:57 UTC (permalink / raw)
To: Jordan Crouse; +Cc: Joerg Pommnitz, cebbert, linux-kernel
Jordan Crouse wrote:
>
> As background, I'm using syslinux 3.36 as my loader here - I've used this
> exact same version for a very long time, so I don't blame it in the least.
> Something is getting confused in the early kernel, and whatever that
> something is, a still unknown change in a newer version of the BIOS
> fixed it. The search goes on.
>
OK, we should put printf's in arch/i386/boot/memory.c and see what
actually gets read out from the BIOS. This could either be a BIOS
problem or a bug in memory.c (or a bug elsewhere in the code that the
change in memory.c triggers, but that seems less likely.)
-hpa
P.S. Are you guys in the Bay Area by any chance?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
2007-09-26 15:41 ` Jordan Crouse
2007-09-26 16:57 ` H. Peter Anvin
@ 2007-09-26 19:14 ` H. Peter Anvin
2007-09-26 20:58 ` Jordan Crouse
1 sibling, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2007-09-26 19:14 UTC (permalink / raw)
To: Jordan Crouse; +Cc: Joerg Pommnitz, cebbert, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 983 bytes --]
Jordan Crouse wrote:
>
> Its the latter - max_pfn as read by find_max_pfn() in arch/i386/e820.c
> is being set to 9F (640k) in the broken case, this due to the
> the e820 map looking something like this:
>
> Address Size Type
> 00000000 0009FC00 1
> 0009FC00 00000400 2
> 000E0000 00002000 2
>
> (Yep, thats it - thats the list. e820.nr_map is indeed 3).
>
> Long story short, bdata->node_low_pfn gets set to 9F, and When we
> try to allocate the bootmem bitmap (at _pa_symbol(_text), which is
> page 0x100), then the system gets appropriately angry.
>
> As background, I'm using syslinux 3.36 as my loader here - I've used this
> exact same version for a very long time, so I don't blame it in the least.
> Something is getting confused in the early kernel, and whatever that
> something is, a still unknown change in a newer version of the BIOS
> fixed it. The search goes on.
>
Please try the following debug patch to let us know what is going on.
-hpa
[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 488 bytes --]
diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index 1a2e62d..a0ccf29 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -33,6 +33,12 @@ static int detect_memory_e820(void)
"=m" (*desc)
: "D" (desc), "a" (0xe820));
+ printf("e820: err %d id 0x%08x next %u %08x:%08x %u\n",
+ err, id, next,
+ (unsigned int)desc->addr,
+ (unsigned int)desc->size,
+ desc->type);
+
if (err || id != SMAP)
break;
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
2007-09-26 19:14 ` H. Peter Anvin
@ 2007-09-26 20:58 ` Jordan Crouse
2007-09-26 21:04 ` H. Peter Anvin
0 siblings, 1 reply; 11+ messages in thread
From: Jordan Crouse @ 2007-09-26 20:58 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Joerg Pommnitz, cebbert, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2102 bytes --]
On 26/09/07 12:14 -0700, H. Peter Anvin wrote:
> Please try the following debug patch to let us know what is going on.
>
> -hpa
> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
> index 1a2e62d..a0ccf29 100644
> --- a/arch/i386/boot/memory.c
> +++ b/arch/i386/boot/memory.c
> @@ -33,6 +33,12 @@ static int detect_memory_e820(void)
> "=m" (*desc)
> : "D" (desc), "a" (0xe820));
>
> + printf("e820: err %d id 0x%08x next %u %08x:%08x %u\n",
> + err, id, next,
> + (unsigned int)desc->addr,
> + (unsigned int)desc->size,
> + desc->type);
> +
> if (err || id != SMAP)
> break;
Okay, we have clarity. Here is the output
e820: err 0 id 0x534d4150 next 15476 00000000:0009fc00 1
e820: err 0 id 0x534d4150 next 15496 0009fc00:00000400 2
e820: err 0 id 0x534d4150 next 15516 000e0000:00020000 2
e820: err 0 id 0x0e7b0000 next 11536 00100000:0e6b0000 1
In the last entry, id is obviously wrong (it should be 'SMAP' or
0x534d4150). This is the BIOS bug.
Here's the reason why this bothers us now. In the old assembly code,
if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801
code. In the new code in memory.c, if id != SMAP, we break out of the
int15 loop, and return boot_params.e820_entries, which in our case is
3. detect_memory() considers this to be successful, and no attempt to
parse e801 is made.
So thats where the problem is - in the old code with the buggy BIOS, we
punted to reading the e801 information, and that was enough to keep us
going. In the new code, we allow a partial table to be used, and we
blow up.
Attached is a patch to fix this - it returns -1 on error, and only sets
boot_params.e820_entries to be non-zero if we have something useful
in it. This punts the detection to the e801 code, which then is
then successful.
This fixes the problem with the DB800, and so it probably should
with the other Geode platforms affected by this.
Many thanks to hpa for the guiding hand.
Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.
[-- Attachment #2: memory-bail-on-error.patch --]
[-- Type: text/plain, Size: 1168 bytes --]
[i386]: Return an error if the e820 detection goes bad
From: Jordan Crouse <jordan.crouse@amd.com>
Change the e820 code to always return an error if something
bad happens while reading the e820 map. This matches the
old code behavior, and allows brain-dead e820 implementations
to still work.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
---
arch/i386/boot/memory.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index 1a2e62d..4c7f0f6 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -22,7 +22,7 @@ static int detect_memory_e820(void)
{
u32 next = 0;
u32 size, id;
- u8 err;
+ u8 err, count = 0;
struct e820entry *desc = boot_params.e820_map;
do {
@@ -34,13 +34,14 @@ static int detect_memory_e820(void)
: "D" (desc), "a" (0xe820));
if (err || id != SMAP)
- break;
+ return -1;
- boot_params.e820_entries++;
+ count++;
desc++;
} while (next && boot_params.e820_entries < E820MAX);
- return boot_params.e820_entries;
+ boot_params.e820_entries = count;
+ return count;
}
static int detect_memory_e801(void)
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
2007-09-26 20:58 ` Jordan Crouse
@ 2007-09-26 21:04 ` H. Peter Anvin
2007-09-26 21:15 ` Jordan Crouse
0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2007-09-26 21:04 UTC (permalink / raw)
To: Jordan Crouse; +Cc: Joerg Pommnitz, cebbert, linux-kernel
Jordan Crouse wrote:
> On 26/09/07 12:14 -0700, H. Peter Anvin wrote:
>> Please try the following debug patch to let us know what is going on.
>>
>> -hpa
>
>> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
>> index 1a2e62d..a0ccf29 100644
>> --- a/arch/i386/boot/memory.c
>> +++ b/arch/i386/boot/memory.c
>> @@ -33,6 +33,12 @@ static int detect_memory_e820(void)
>> "=m" (*desc)
>> : "D" (desc), "a" (0xe820));
>>
>> + printf("e820: err %d id 0x%08x next %u %08x:%08x %u\n",
>> + err, id, next,
>> + (unsigned int)desc->addr,
>> + (unsigned int)desc->size,
>> + desc->type);
>> +
>> if (err || id != SMAP)
>> break;
>
> Okay, we have clarity. Here is the output
>
> e820: err 0 id 0x534d4150 next 15476 00000000:0009fc00 1
> e820: err 0 id 0x534d4150 next 15496 0009fc00:00000400 2
> e820: err 0 id 0x534d4150 next 15516 000e0000:00020000 2
> e820: err 0 id 0x0e7b0000 next 11536 00100000:0e6b0000 1
>
> In the last entry, id is obviously wrong (it should be 'SMAP' or
> 0x534d4150). This is the BIOS bug.
>
> Here's the reason why this bothers us now. In the old assembly code,
> if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801
> code. In the new code in memory.c, if id != SMAP, we break out of the
> int15 loop, and return boot_params.e820_entries, which in our case is
> 3. detect_memory() considers this to be successful, and no attempt to
> parse e801 is made.
>
> So thats where the problem is - in the old code with the buggy BIOS, we
> punted to reading the e801 information, and that was enough to keep us
> going. In the new code, we allow a partial table to be used, and we
> blow up.
>
> Attached is a patch to fix this - it returns -1 on error, and only sets
> boot_params.e820_entries to be non-zero if we have something useful
> in it. This punts the detection to the e801 code, which then is
> then successful.
>
> This fixes the problem with the DB800, and so it probably should
> with the other Geode platforms affected by this.
>
> Many thanks to hpa for the guiding hand.
>
This patch is obviously wrong. There are a lot of e820 BIOSen out there
that terminate with CF=1, and that is a legitimate termination condition
for e820. Now, as far as what to do when id != SMAP, it probably is
still the right thing to do; since the BOS vendor couldn't get something
that elementary correct, we shouldn't trust the data.
I'll write up a corrected patch.
-hpa
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
2007-09-26 21:04 ` H. Peter Anvin
@ 2007-09-26 21:15 ` Jordan Crouse
2007-09-26 21:20 ` H. Peter Anvin
0 siblings, 1 reply; 11+ messages in thread
From: Jordan Crouse @ 2007-09-26 21:15 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Joerg Pommnitz, cebbert, linux-kernel
On 26/09/07 14:04 -0700, H. Peter Anvin wrote:
> Jordan Crouse wrote:
> > On 26/09/07 12:14 -0700, H. Peter Anvin wrote:
> >> Please try the following debug patch to let us know what is going on.
> >>
> >> -hpa
> >
> >> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
> >> index 1a2e62d..a0ccf29 100644
> >> --- a/arch/i386/boot/memory.c
> >> +++ b/arch/i386/boot/memory.c
> >> @@ -33,6 +33,12 @@ static int detect_memory_e820(void)
> >> "=m" (*desc)
> >> : "D" (desc), "a" (0xe820));
> >>
> >> + printf("e820: err %d id 0x%08x next %u %08x:%08x %u\n",
> >> + err, id, next,
> >> + (unsigned int)desc->addr,
> >> + (unsigned int)desc->size,
> >> + desc->type);
> >> +
> >> if (err || id != SMAP)
> >> break;
> >
> > Okay, we have clarity. Here is the output
> >
> > e820: err 0 id 0x534d4150 next 15476 00000000:0009fc00 1
> > e820: err 0 id 0x534d4150 next 15496 0009fc00:00000400 2
> > e820: err 0 id 0x534d4150 next 15516 000e0000:00020000 2
> > e820: err 0 id 0x0e7b0000 next 11536 00100000:0e6b0000 1
> >
> > In the last entry, id is obviously wrong (it should be 'SMAP' or
> > 0x534d4150). This is the BIOS bug.
> >
> > Here's the reason why this bothers us now. In the old assembly code,
> > if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801
> > code. In the new code in memory.c, if id != SMAP, we break out of the
> > int15 loop, and return boot_params.e820_entries, which in our case is
> > 3. detect_memory() considers this to be successful, and no attempt to
> > parse e801 is made.
> >
> > So thats where the problem is - in the old code with the buggy BIOS, we
> > punted to reading the e801 information, and that was enough to keep us
> > going. In the new code, we allow a partial table to be used, and we
> > blow up.
> >
> > Attached is a patch to fix this - it returns -1 on error, and only sets
> > boot_params.e820_entries to be non-zero if we have something useful
> > in it. This punts the detection to the e801 code, which then is
> > then successful.
> >
> > This fixes the problem with the DB800, and so it probably should
> > with the other Geode platforms affected by this.
> >
> > Many thanks to hpa for the guiding hand.
> >
>
> This patch is obviously wrong. There are a lot of e820 BIOSen out there
> that terminate with CF=1, and that is a legitimate termination condition
> for e820. Now, as far as what to do when id != SMAP, it probably is
> still the right thing to do; since the BOS vendor couldn't get something
> that elementary correct, we shouldn't trust the data.
>
> I'll write up a corrected patch.
Hmm - the old code seems to fail to e801 when CF was set too:
int $0x15 # make the call
jc bail820 # fall to e801 if it fails
cmpl $SMAP, %eax # check the return is `SMAP'
jne bail820 # fall to e801 if it fails
Thats not to say that the old code was correct, mind you. Failing on a
bad ID and returning without error on a set CF seems to be good to me.
Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
2007-09-26 21:15 ` Jordan Crouse
@ 2007-09-26 21:20 ` H. Peter Anvin
2007-09-26 21:30 ` Jordan Crouse
0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2007-09-26 21:20 UTC (permalink / raw)
To: Jordan Crouse; +Cc: Joerg Pommnitz, cebbert, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 544 bytes --]
Jordan Crouse wrote:
>
> Hmm - the old code seems to fail to e801 when CF was set too:
>
> int $0x15 # make the call
> jc bail820 # fall to e801 if it fails
>
> cmpl $SMAP, %eax # check the return is `SMAP'
> jne bail820 # fall to e801 if it fails
>
> Thats not to say that the old code was correct, mind you. Failing on a
> bad ID and returning without error on a set CF seems to be good to me.
>
Testing this patch now:
[-- Attachment #2: 0001-x86-setup-Handle-case-of-improperly-terminated-E82.patch --]
[-- Type: text/x-patch, Size: 2246 bytes --]
>From 2efa33f81ef56e7700c09a3d8a881c96692149e5 Mon Sep 17 00:00:00 2001
From: H. Peter Anvin <hpa@zytor.com>
Date: Wed, 26 Sep 2007 14:11:43 -0700
Subject: [PATCH] [x86 setup] Handle case of improperly terminated E820 chain
At least one system (a Geode system with a Digital Logic BIOS) has
been found which suddenly stops reporting the SMAP signature when
reading the E820 memory chain. We can't know what, exactly, broke in
the BIOS, so if we detect this situation, declare the E820 data
unusable and fall back to E801.
Also, revert to original behavior of always probing all memory
methods; that way all the memory information is available to the
kernel.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Cc: Joerg Pommnitz <pommnitz@yahoo.com>
---
arch/i386/boot/memory.c | 30 +++++++++++++++++++++++-------
1 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index 1a2e62d..bccaa1c 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -20,6 +20,7 @@
static int detect_memory_e820(void)
{
+ int count = 0;
u32 next = 0;
u32 size, id;
u8 err;
@@ -33,14 +34,24 @@ static int detect_memory_e820(void)
"=m" (*desc)
: "D" (desc), "a" (0xe820));
- if (err || id != SMAP)
+ /* Some BIOSes stop returning SMAP in the middle of
+ the search loop. We don't know exactly how the BIOS
+ screwed up the map at that point, we might have a
+ partial map, the full map, or complete garbage, so
+ just return failure. */
+ if (id != SMAP) {
+ count = 0;
break;
+ }
- boot_params.e820_entries++;
+ if (err)
+ break;
+
+ count++;
desc++;
- } while (next && boot_params.e820_entries < E820MAX);
+ } while (next && count < E820MAX);
- return boot_params.e820_entries;
+ return boot_params.e820_entries = count;
}
static int detect_memory_e801(void)
@@ -89,11 +100,16 @@ static int detect_memory_88(void)
int detect_memory(void)
{
+ int err = -1;
+
if (detect_memory_e820() > 0)
- return 0;
+ err = 0;
if (!detect_memory_e801())
- return 0;
+ err = 0;
+
+ if (!detect_memory_88())
+ err = 0;
- return detect_memory_88();
+ return err;
}
--
1.5.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
2007-09-26 21:20 ` H. Peter Anvin
@ 2007-09-26 21:30 ` Jordan Crouse
0 siblings, 0 replies; 11+ messages in thread
From: Jordan Crouse @ 2007-09-26 21:30 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Joerg Pommnitz, cebbert, linux-kernel
On 26/09/07 14:20 -0700, H. Peter Anvin wrote:
> Testing this patch now:
>
> >From 2efa33f81ef56e7700c09a3d8a881c96692149e5 Mon Sep 17 00:00:00 2001
> From: H. Peter Anvin <hpa@zytor.com>
> Date: Wed, 26 Sep 2007 14:11:43 -0700
> Subject: [PATCH] [x86 setup] Handle case of improperly terminated E820 chain
>
> At least one system (a Geode system with a Digital Logic BIOS) has
> been found which suddenly stops reporting the SMAP signature when
> reading the E820 memory chain. We can't know what, exactly, broke in
> the BIOS, so if we detect this situation, declare the E820 data
> unusable and fall back to E801.
>
> Also, revert to original behavior of always probing all memory
> methods; that way all the memory information is available to the
> kernel.
>
> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
> Cc: Jordan Crouse <jordan.crouse@amd.com>
> Cc: Joerg Pommnitz <pommnitz@yahoo.com>
> ---
> arch/i386/boot/memory.c | 30 +++++++++++++++++++++++-------
> 1 files changed, 23 insertions(+), 7 deletions(-)
>
> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
> index 1a2e62d..bccaa1c 100644
> --- a/arch/i386/boot/memory.c
> +++ b/arch/i386/boot/memory.c
> @@ -20,6 +20,7 @@
>
> static int detect_memory_e820(void)
> {
> + int count = 0;
> u32 next = 0;
> u32 size, id;
> u8 err;
> @@ -33,14 +34,24 @@ static int detect_memory_e820(void)
> "=m" (*desc)
> : "D" (desc), "a" (0xe820));
>
> - if (err || id != SMAP)
> + /* Some BIOSes stop returning SMAP in the middle of
> + the search loop. We don't know exactly how the BIOS
> + screwed up the map at that point, we might have a
> + partial map, the full map, or complete garbage, so
> + just return failure. */
> + if (id != SMAP) {
> + count = 0;
> break;
> + }
>
> - boot_params.e820_entries++;
> + if (err)
> + break;
> +
> + count++;
> desc++;
> - } while (next && boot_params.e820_entries < E820MAX);
> + } while (next && count < E820MAX);
>
> - return boot_params.e820_entries;
> + return boot_params.e820_entries = count;
> }
>
> static int detect_memory_e801(void)
> @@ -89,11 +100,16 @@ static int detect_memory_88(void)
>
> int detect_memory(void)
> {
> + int err = -1;
> +
> if (detect_memory_e820() > 0)
> - return 0;
> + err = 0;
>
> if (!detect_memory_e801())
> - return 0;
> + err = 0;
> +
> + if (!detect_memory_88())
> + err = 0;
>
> - return detect_memory_88();
> + return err;
> }
> --
> 1.5.3.1
>
Works here with the buggy BIOS.
Acked-by: Jordan Crouse <jordan.crouse@amd.com>
Thanks.
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800
@ 2007-09-26 15:28 Joerg Pommnitz
0 siblings, 0 replies; 11+ messages in thread
From: Joerg Pommnitz @ 2007-09-26 15:28 UTC (permalink / raw)
To: linux-kernel, hpa
[-- Attachment #1: Type: text/plain, Size: 1439 bytes --]
> There is something very fishy.
>
> The only documentation you've given us so far is a screen shot which
> contained a message ("BIOS data check successful") which doesn't occur
> in the kernel.
>
> The loader string doesn't look all that familiar either; it looks like
> an extremely old version of SYSLINUX, but that doesn't contain that
> message either.
The boot loader is LILO from Ubuntu 7.04, so it should be fairly recent.
> INT 6 is #UD, the undefined instruction exception. This is consistent with:
>
> > Its hitting a bug - specifically (from bootmem.c:125):
> > BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);
>
> However, all that tells us is that reserve_bootmem_core() was either
> called with a bad address or bdata->node_low_pfn is garbage. In
> particular, without knowing how it got there it's hard to know for sure.
>
> Could you send me the boot messages from a working kernel boot?
Attached is a boot log I get where the last patch is
f2d98ae63dc64dedb00499289e13a50677f771f9, e.g. "Linker script for the
new x86 setup code".
The kernel is directly from "git bisect, make defconfig, make", no local
changes or strange patches applied. Build environment is plain Ubuntu-7.04.
--
Kind regards
Joerg
__________________________________
Alles was der Gesundheit und Entspannung dient. BE A BETTER MEDIZINMANN! www.yahoo.de/clever
[-- Attachment #2: dmesg.txt --]
[-- Type: text/plain, Size: 10391 bytes --]
Linux version 2.6.22-gf2d98ae6 (jpo@jpo-laptop) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #12 SMP Wed Sep 26 12:45:21 CEST 2007
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000001e7b0000 (usable)
BIOS-e820: 000000001e7b0000 - 000000001e7bffc0 (ACPI data)
BIOS-e820: 000000001e7bffc0 - 000000001e7c0000 (ACPI NVS)
BIOS-e820: 0000000040400000 - 0000000040440004 (reserved)
BIOS-e820: 00000000f0000000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
487MB LOWMEM available.
Entering add_active_range(0, 0, 124848) 0 entries of 256 used
Zone PFN ranges:
DMA 0 -> 4096
Normal 4096 -> 124848
HighMem 124848 -> 124848
early_node_map[1] active PFN ranges
0: 0 -> 124848
On node 0 totalpages: 124848
DMA zone: 32 pages used for memmap
DMA zone: 0 pages reserved
DMA zone: 4064 pages, LIFO batch:0
Normal zone: 943 pages used for memmap
Normal zone: 119809 pages, LIFO batch:31
HighMem zone: 0 pages used for memmap
DMI not present or invalid.
Using APIC driver default
ACPI: RSDP 000E9010, 0014 (r0 OID_00)
ACPI: RSDT 1E7B2AE0, 0030 (r1 AMD RSDT_000 31303030 AMD 31303030)
ACPI: FACP 1E7B29E0, 0084 (r1 AMD FACP_000 31303030 AMD 31303030)
ACPI: DSDT 1E7B0000, 29D6 (r1 INSYDE CS553x 1007 INTL 20030122)
ACPI: FACS 1E7BFFC0, 0040
ACPI: BOOT 1E7B2A70, 0028 (r1 AMD BOOT_000 31303030 AMD 31303030)
ACPI: DBGP 1E7B2AA0, 0034 (r1 AMD DBGP_000 31303030 AMD 31303030)
ACPI: no DMI BIOS year, acpi=force is required to enable ACPI
ACPI: Disabling ACPI support
Allocating PCI resources starting at 50000000 (gap: 40440004:afbbfffc)
Built 1 zonelists. Total pages: 123873
Kernel command line: BOOT_IMAGE=Linux2623 ro root=341
No local APIC present or hardware disabled
mapped APIC to ffffd000 (013dc000)
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 8192 bytes)
Detected 498.434 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 488944k/499392k available (3041k kernel code, 9872k reserved, 1517k data, 292k init, 0k highmem)
virtual kernel memory layout:
fixmap : 0xffe16000 - 0xfffff000 (1956 kB)
pkmap : 0xff800000 - 0xffc00000 (4096 kB)
vmalloc : 0xdf000000 - 0xff7fe000 ( 519 MB)
lowmem : 0xc0000000 - 0xde7b0000 ( 487 MB)
.init : 0xc057b000 - 0xc05c4000 ( 292 kB)
.data : 0xc03f86e0 - 0xc0573ecc (1517 kB)
.text : 0xc0100000 - 0xc03f86e0 (3041 kB)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 998.38 BogoMIPS (lpj=1996769)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0088a93d c0c0a13d 00000000 00000000 00000000 00000000 00000000 00000000
CPU: L1 I Cache: 64K (32 bytes/line), D cache 64K (32 bytes/line)
CPU: L2 Cache: 128K (32 bytes/line)
CPU: After all inits, caps: 0088a93d c0c0a13d 00000000 00000000 00000000 00000000 00000000 00000000
Compat vDSO mapped to ffffe000.
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
Freeing SMP alternatives: 18k freed
CPU0: AMD Geode(TM) Integrated Processor by AMD PCS stepping 02
SMP motherboard not detected.
Local APIC not detected. Using dummy APIC emulation.
Brought up 1 CPUs
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xff8b7, last bus=0
PCI: Using configuration type 1
Setting up standard PCI resources
ACPI: Interpreter disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI: disabled
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Firmware left 0000:00:13.0 e100 interrupts enabled, disabling
PCI: Using IRQ router default [1022/2090] at 0000:00:0f.0
Time: tsc clocksource has been installed.
NET: Registered protocol family 2
IP route cache hash table entries: 4096 (order: 2, 16384 bytes)
TCP established hash table entries: 16384 (order: 5, 196608 bytes)
TCP bind hash table entries: 16384 (order: 5, 131072 bytes)
TCP: Hash tables configured (established 16384 bind 16384)
TCP reno registered
Simple Boot Flag at 0x3f set to 0x1
microcode: CPU0 not a capable Intel processor
IA-32 Microcode Update Driver: v1.14a <tigran@aivazian.fsnet.co.uk>
Total HugeTLB memory allocated, 0
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
io scheduler noop registered
io scheduler anticipatory registered (default)
io scheduler deadline registered
io scheduler cfq registered
Boot video device is 0000:00:01.1
Real Time Clock Driver v1.12ac
AMD Geode RNG detected
Linux agpgart interface v0.102 (c) Dave Jones
Hangcheck: starting hangcheck timer 0.9.0 (tick is 180 seconds, margin is 60 seconds).
Hangcheck: Using get_cycles().
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
serial8250.0: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250.0: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
serial8250.0: ttyS2 at I/O 0x3e8 (irq = 4) is a 16550A
serial8250.0: ttyS3 at I/O 0x2e8 (irq = 3) is a 16550A
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: module loaded
Intel(R) PRO/1000 Network Driver - version 7.3.20-k2
Copyright (c) 1999-2006 Intel Corporation.
PCI: Guessed IRQ 10 for device 0000:00:12.0
e1000: 0000:00:12.0: e1000_probe: (PCI:33MHz:32-bit) 00:30:59:03:89:5e
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
e100: Copyright(c) 1999-2006 Intel Corporation
PCI: Guessed IRQ 10 for device 0000:00:13.0
PCI: Sharing IRQ 10 with 0000:00:01.1
PCI: Sharing IRQ 10 with 0000:00:01.2
PCI: Sharing IRQ 10 with 0000:00:11.0
e100: eth1: e100_probe: addr 0xef600000, irq 10, MAC addr 00:30:59:03:86:9F
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
netconsole: not configured, aborting
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD5536: IDE controller at PCI slot 0000:00:0f.2
AMD5536: chipset revision 1
AMD5536: not 100% native mode: will probe irqs later
AMD5536: 0000:00:0f.2 (rev 01) UDMA100 controller
ide0: BM-DMA at 0xeff0-0xeff7, BIOS settings: hda:pio, hdb:DMA
Probing IDE interface ide0...
hdb: SILICONSYSTEMS INC 1GB, ATA DISK drive
hdb: selected mode 0x22
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdb: max request size: 128KiB
hdb: 2046240 sectors (1047 MB) w/0KiB Cache, CHS=2030/16/63, DMA
hdb: hdb1
3ware Storage Controller device driver for Linux v1.26.02.002.
Fusion MPT base driver 3.04.04
Copyright (c) 1999-2007 LSI Logic Corporation
Fusion MPT SPI Host driver 3.04.04
ieee1394: raw1394: /dev/raw1394 device initialized
usbmon: debugfs is not available
PCI: Guessed IRQ 11 for device 0000:00:0f.5
PCI: Sharing IRQ 11 with 0000:00:0f.4
PCI: Setting latency timer of device 0000:00:0f.5 to 64
ehci_hcd 0000:00:0f.5: EHCI Host Controller
ehci_hcd 0000:00:0f.5: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:0f.5: irq 11, io mem 0xefb00000
ehci_hcd 0000:00:0f.5: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 4 ports detected
ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver
PCI: Guessed IRQ 11 for device 0000:00:0f.4
PCI: Sharing IRQ 11 with 0000:00:0f.5
PCI: Setting latency timer of device 0000:00:0f.4 to 64
ohci_hcd 0000:00:0f.4: OHCI Host Controller
ohci_hcd 0000:00:0f.4: new USB bus registered, assigned bus number 2
ohci_hcd 0000:00:0f.4: irq 11, io mem 0xeff00000
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 4 ports detected
USB Universal Host Controller Interface driver v3.0
usb 1-2: new high speed USB device using ehci_hcd and address 2
usb 1-2: configuration #1 chosen from 1 choice
usb 2-4: new low speed USB device using ohci_hcd and address 2
usb 2-4: configuration #1 chosen from 1 choice
usbcore: registered new interface driver usblp
drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver
Initializing USB Mass Storage driver...
scsi0 : SCSI emulation for USB Mass Storage devices
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
usb-storage: device found at 2
usb-storage: waiting for device to settle before scanning
PNP: No PS/2 controller found. Probing ports directly.
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
input: HID 04d9:1400 as /class/input/input0
input: USB HID v1.10 Keyboard [HID 04d9:1400] on usb-0000:00:0f.4-4
input: HID 04d9:1400 as /class/input/input1
input: USB HID v1.10 Mouse [HID 04d9:1400] on usb-0000:00:0f.4-4
usbcore: registered new interface driver usbhid
drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver
oprofile: using timer interrupt.
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 10
IPv6 over IPv4 tunneling driver
NET: Registered protocol family 17
powernow-k8: Processor cpuid 5a2 not supported
Using IPI Shortcut mode
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 292k freed
scsi 0:0:0:0: Direct-Access USB Flash Disk 1100 PQ: 0 ANSI: 0 CCS
sd 0:0:0:0: [sda] 1980416 512-byte hardware sectors (1014 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 43 00 00 00
sd 0:0:0:0: [sda] Assuming drive cache: write through
sd 0:0:0:0: [sda] 1980416 512-byte hardware sectors (1014 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 43 00 00 00
sd 0:0:0:0: [sda] Assuming drive cache: write through
sda: sda1
sd 0:0:0:0: [sda] Attached SCSI removable disk
usb-storage: device scan complete
ADDRCONF(NETDEV_UP): eth0: link is not ready
ADDRCONF(NETDEV_UP): eth1: link is not ready
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-09-26 21:30 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-26 10:56 Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800 Joerg Pommnitz
2007-09-26 14:10 ` H. Peter Anvin
2007-09-26 15:41 ` Jordan Crouse
2007-09-26 16:57 ` H. Peter Anvin
2007-09-26 19:14 ` H. Peter Anvin
2007-09-26 20:58 ` Jordan Crouse
2007-09-26 21:04 ` H. Peter Anvin
2007-09-26 21:15 ` Jordan Crouse
2007-09-26 21:20 ` H. Peter Anvin
2007-09-26 21:30 ` Jordan Crouse
-- strict thread matches above, loose matches on Subject: below --
2007-09-26 15:28 Joerg Pommnitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox