public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* PCI Express MMCONFIG and BIOS Bug messages..
@ 2007-04-19  1:25 Robert Hancock
  2007-04-19 16:05 ` Chuck Ebbert
  0 siblings, 1 reply; 7+ messages in thread
From: Robert Hancock @ 2007-04-19  1:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andi Kleen

I've seen a lot of systems (including brand new Xeon-based servers from 
IBM and HP) that output messages on boot like:

PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
PCI: Not using MMCONFIG.

As I understand it, this is sort of a sanity check mechanism to make 
sure the MCFG address reported is remotely reasonable and intended to be 
used as such. Problem is, I doubt the BIOS authors would agree that this 
constitutes a bug. Microsoft is providing a lot of the direction for 
BIOS writers, and have a look at this presentation "PCI Express, 
Windows, And The Legacy Transition" from back in 2004:

http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/TW04047_WINHEC2004.ppt

On page 14, "Existing Windows - Reserve MMCONFIG":

Existing Windows versions won’t understand MCFG table
* Backwards-compatible range reservation must be used
Report range in ACPI "Motherboard Resources"
*_CRS of PNP0C02 node
* PNP0C02 must be at \_SB scope
* Range must be marked as consumed
Do not include range in _CRS of PCI root bus
* If included, OS will assume that this range can be allocated to devices
E820 table/EFI memory map
  * Not necessary to describe MMConfig here
  * For Windows, these are used to describe RAM
  * No harm in including range as reserved either

So Microsoft is explicitly telling the BIOS developers that there is no 
need to reserve the MMCONFIG space in the E820 table because Windows 
doesn't care. On that basis it doesn't seem like a valid check to 
require it to be so reserved, then.

Really, I think we should be basing this check on whether the 
corresponding memory range is reserved in the ACPI resources, like 
Windows expects. This does require putting more fingers into ACPI from 
this early boot stage, though..

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCI Express MMCONFIG and BIOS Bug messages..
  2007-04-19  1:25 PCI Express MMCONFIG and BIOS Bug messages Robert Hancock
@ 2007-04-19 16:05 ` Chuck Ebbert
  2007-04-29  1:22   ` Robert Hancock
  0 siblings, 1 reply; 7+ messages in thread
From: Chuck Ebbert @ 2007-04-19 16:05 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel, Andi Kleen

Robert Hancock wrote:
> I've seen a lot of systems (including brand new Xeon-based servers from
> IBM and HP) that output messages on boot like:
> 
> PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
> PCI: Not using MMCONFIG.
> 
> 
> So Microsoft is explicitly telling the BIOS developers that there is no
> need to reserve the MMCONFIG space in the E820 table because Windows
> doesn't care. On that basis it doesn't seem like a valid check to
> require it to be so reserved, then.
> 
> Really, I think we should be basing this check on whether the
> corresponding memory range is reserved in the ACPI resources, like
> Windows expects. This does require putting more fingers into ACPI from
> this early boot stage, though..
> 

Intel had posted patches to do exactly that, but they were rejected.
I don't remember why now...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCI Express MMCONFIG and BIOS Bug messages..
  2007-04-19 16:05 ` Chuck Ebbert
@ 2007-04-29  1:22   ` Robert Hancock
  2007-04-29 10:12     ` Andi Kleen
  0 siblings, 1 reply; 7+ messages in thread
From: Robert Hancock @ 2007-04-29  1:22 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel, Andi Kleen, Len Brown, linux-acpi

Chuck Ebbert wrote:
> Robert Hancock wrote:
>> I've seen a lot of systems (including brand new Xeon-based servers from
>> IBM and HP) that output messages on boot like:
>>
>> PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
>> PCI: Not using MMCONFIG.
>>
>>
>> So Microsoft is explicitly telling the BIOS developers that there is no
>> need to reserve the MMCONFIG space in the E820 table because Windows
>> doesn't care. On that basis it doesn't seem like a valid check to
>> require it to be so reserved, then.
>>
>> Really, I think we should be basing this check on whether the
>> corresponding memory range is reserved in the ACPI resources, like
>> Windows expects. This does require putting more fingers into ACPI from
>> this early boot stage, though..
>>
> 
> Intel had posted patches to do exactly that, but they were rejected.
> I don't remember why now...

I tried adapting a patch by Rajesh Shah to do this for current kernels:

http://lkml.org/lkml/2006/6/23/365

It walks through all the motherboard resource devices and tries to pull 
out the resource settings for all of them using the _CRS method. 
(Depending on how you do the probing, the _STA method is called as well, 
either before or after.) From my limited ACPI knowledge, the problem is 
that the PCI MMCONFIG initialization is called before the main ACPI 
interpreter is enabled, and these control methods may try to access 
operation regions who don't have handlers set up for them yet, so a 
bunch of "no handler for region" errors show up.

I think some earlier version of this patch was in -mm for a while back 
in 2.6.18 times, I actually complained about it back then because it 
falsely detected the region wasn't reserved on my system since it bailed 
out on the first such error before it found the reservation. On my 
system it turns out that the device called EXPL that has the MCFG table 
reservation in it has the addresses statically defined in the _CRS 
method and doesn't need to access any regions, so if you make the search 
continue on after errors, it does actually work, but there's probably no 
guarantee that all systems will have the MCFG reservation statically 
defined like this, and we can't have all those ACPI errors from other 
devices clogging the logs either.

So essentially if we want to do this check based on ACPI resource 
reservations, we need to be able to execute control methods at the point 
that MMCONFIG is set up. Is there a reason why this can't be made 
possible (like by moving the necessary parts of ACPI initialization 
earlier)?

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCI Express MMCONFIG and BIOS Bug messages..
  2007-04-29  1:22   ` Robert Hancock
@ 2007-04-29 10:12     ` Andi Kleen
  2007-04-29 18:20       ` Robert Hancock
  0 siblings, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2007-04-29 10:12 UTC (permalink / raw)
  To: Robert Hancock; +Cc: Chuck Ebbert, linux-kernel, Len Brown, linux-acpi


> 
> I tried adapting a patch by Rajesh Shah to do this for current kernels:

The Intel patches checked against ACPI which also didn't work in all cases.

You're right the e820 check is overzealous and has a lot of false positives,
but it is the only generic way we know right now to handle a common i965 BIOS
bug. Also there is the nasty case of the Apple EFI boxes where only mmconfig
works which has to be handled too.

I expect eventually the logic to be:

- If we know the hardware: read it from hw registers; trust them; ignore BIOS.
- Otherwise check e820 and ACPI resources and be very trigger happy at not using
it

> It walks through all the motherboard resource devices and tries to pull 
> out the resource settings for all of them using the _CRS method. 

I tested it originally on a Intel system with the above BIOS problem
and it didn't help there.

> (Depending on how you do the probing, the _STA method is called as well, 
> either before or after.) From my limited ACPI knowledge, the problem is 
> that the PCI MMCONFIG initialization is called before the main ACPI 
> interpreter is enabled, and these control methods may try to access 
> operation regions who don't have handlers set up for them yet, so a 
> bunch of "no handler for region" errors show up.

mmconfig access can be switched later without problems; so it would
be possible to boot using Type1 if it works (e.g. detect the Apple case) 
and switch later.

It's all quite tricky unfortunately; that is why i left it at the current
relatively safe state for now. After all mmconfig is normally not needed.

> So essentially if we want to do this check based on ACPI resource 
> reservations, we need to be able to execute control methods at the point 
> that MMCONFIG is set up. Is there a reason why this can't be made 
> possible (like by moving the necessary parts of ACPI initialization 
> earlier)?

ACPI Interpreter wants to allocate memory and use other kernel services that
are not available in really early boot. It could be probably done somehow,
but would be quite ugly with lots of special cases.

-Andi
 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCI Express MMCONFIG and BIOS Bug messages..
  2007-04-29 10:12     ` Andi Kleen
@ 2007-04-29 18:20       ` Robert Hancock
  2007-04-29 18:27         ` Jesse Barnes
  0 siblings, 1 reply; 7+ messages in thread
From: Robert Hancock @ 2007-04-29 18:20 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Chuck Ebbert, linux-kernel, Len Brown, linux-acpi, Jesse Barnes

Andi Kleen wrote:
>> I tried adapting a patch by Rajesh Shah to do this for current kernels:
> 
> The Intel patches checked against ACPI which also didn't work in all cases.
> 
> You're right the e820 check is overzealous and has a lot of false positives,
> but it is the only generic way we know right now to handle a common i965 BIOS
> bug. Also there is the nasty case of the Apple EFI boxes where only mmconfig
> works which has to be handled too.
> 
> I expect eventually the logic to be:
> 
> - If we know the hardware: read it from hw registers; trust them; ignore BIOS.
> - Otherwise check e820 and ACPI resources and be very trigger happy at not using
> it

Problem is that even if we read the MMCONFIG table location from the 
hardware registers, that doesn't mean we can trust the result. It could 
be that the BIOS hasn't lied about where it put the table, it just stuck 
it someplace completely unsuitable like on top of RAM or other 
registers. It seems that with some of those 965 chipsets the latter is 
what the BIOS is actually doing, and so when we think we're writing to 
the table we're really writing to random chipset registers and hosing 
things. (Jesse Barnes ran into this while trying to add chipset support 
for the 965).

Likely what we need to do is:

-If chipset is known, take table address from registers, otherwise check 
the MCFG table
-Take the resulting area (Ideally not just the first minimum part as we 
check now, but the full area based on the expected length) and make sure 
that the entire area is covered by a reservation in ACPI motherboard 
resources.
-If that passes, then we still need to sanity check the result by making 
sure it hasn't been mapped over top of something else important. How to 
do this depends on exactly how they've set up the ACPI reservations on 
these broken boxes.. Does someone have a full dmesg from one on a recent 
kernel that shows all the pnpacpi resource reservation output?
-If these checks fail, we don't use the table, and the chipset is known, 
we should likely try to disable decoding of the region so that it won't 
get in the way of anything else.

The current check we have really should go, though. It only excludes 
these broken chipsets based on luck, not on anything that is guaranteed, 
and ends up disabling the table on systems where it's perfectly functional.

> 
>> It walks through all the motherboard resource devices and tries to pull 
>> out the resource settings for all of them using the _CRS method. 
> 
> I tested it originally on a Intel system with the above BIOS problem
> and it didn't help there.
> 
>> (Depending on how you do the probing, the _STA method is called as well, 
>> either before or after.) From my limited ACPI knowledge, the problem is 
>> that the PCI MMCONFIG initialization is called before the main ACPI 
>> interpreter is enabled, and these control methods may try to access 
>> operation regions who don't have handlers set up for them yet, so a 
>> bunch of "no handler for region" errors show up.
> 
> mmconfig access can be switched later without problems; so it would
> be possible to boot using Type1 if it works (e.g. detect the Apple case) 
> and switch later.
> 
> It's all quite tricky unfortunately; that is why i left it at the current
> relatively safe state for now. After all mmconfig is normally not needed.
> 
>> So essentially if we want to do this check based on ACPI resource 
>> reservations, we need to be able to execute control methods at the point 
>> that MMCONFIG is set up. Is there a reason why this can't be made 
>> possible (like by moving the necessary parts of ACPI initialization 
>> earlier)?
> 
> ACPI Interpreter wants to allocate memory and use other kernel services that
> are not available in really early boot. It could be probably done somehow,
> but would be quite ugly with lots of special cases.

Yeah, if we can do this part of MMCONFIG initialization later that would 
likely be a better solution.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCI Express MMCONFIG and BIOS Bug messages..
  2007-04-29 18:20       ` Robert Hancock
@ 2007-04-29 18:27         ` Jesse Barnes
  2007-04-29 18:39           ` Robert Hancock
  0 siblings, 1 reply; 7+ messages in thread
From: Jesse Barnes @ 2007-04-29 18:27 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Andi Kleen, Chuck Ebbert, linux-kernel, Len Brown, linux-acpi

On Sunday, April 29, 2007, Robert Hancock wrote:
> Problem is that even if we read the MMCONFIG table location from the
> hardware registers, that doesn't mean we can trust the result. It could
> be that the BIOS hasn't lied about where it put the table, it just stuck
> it someplace completely unsuitable like on top of RAM or other
> registers. It seems that with some of those 965 chipsets the latter is
> what the BIOS is actually doing, and so when we think we're writing to
> the table we're really writing to random chipset registers and hosing
> things. (Jesse Barnes ran into this while trying to add chipset support
> for the 965).

Right, I've updated the BIOS since, but at least that version was totally 
buggy wrt MMconfig support.  I haven't yet looked at the new one to see if 
it properly reserves MCFG space in ACPI _CRS yet or properly programs it.

> Likely what we need to do is:
>
> -If chipset is known, take table address from registers, otherwise check
> the MCFG table
> -Take the resulting area (Ideally not just the first minimum part as we
> check now, but the full area based on the expected length) and make sure
> that the entire area is covered by a reservation in ACPI motherboard
> resources.
> -If that passes, then we still need to sanity check the result by making
> sure it hasn't been mapped over top of something else important. How to
> do this depends on exactly how they've set up the ACPI reservations on
> these broken boxes.. Does someone have a full dmesg from one on a recent
> kernel that shows all the pnpacpi resource reservation output?
> -If these checks fail, we don't use the table, and the chipset is known,
> we should likely try to disable decoding of the region so that it won't
> get in the way of anything else.

Yeah, that sounds like a good algorithm.

I'm not sure how to handle the fact that we don't have access to the _CRS 
until late in boot though...  Len?

Jesse

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCI Express MMCONFIG and BIOS Bug messages..
  2007-04-29 18:27         ` Jesse Barnes
@ 2007-04-29 18:39           ` Robert Hancock
  0 siblings, 0 replies; 7+ messages in thread
From: Robert Hancock @ 2007-04-29 18:39 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Andi Kleen, Chuck Ebbert, linux-kernel, Len Brown, linux-acpi

Jesse Barnes wrote:
> On Sunday, April 29, 2007, Robert Hancock wrote:
>> Problem is that even if we read the MMCONFIG table location from the
>> hardware registers, that doesn't mean we can trust the result. It could
>> be that the BIOS hasn't lied about where it put the table, it just stuck
>> it someplace completely unsuitable like on top of RAM or other
>> registers. It seems that with some of those 965 chipsets the latter is
>> what the BIOS is actually doing, and so when we think we're writing to
>> the table we're really writing to random chipset registers and hosing
>> things. (Jesse Barnes ran into this while trying to add chipset support
>> for the 965).
> 
> Right, I've updated the BIOS since, but at least that version was totally 
> buggy wrt MMconfig support.  I haven't yet looked at the new one to see if 
> it properly reserves MCFG space in ACPI _CRS yet or properly programs it.
> 
>> Likely what we need to do is:
>>
>> -If chipset is known, take table address from registers, otherwise check
>> the MCFG table
>> -Take the resulting area (Ideally not just the first minimum part as we
>> check now, but the full area based on the expected length) and make sure
>> that the entire area is covered by a reservation in ACPI motherboard
>> resources.
>> -If that passes, then we still need to sanity check the result by making
>> sure it hasn't been mapped over top of something else important. How to
>> do this depends on exactly how they've set up the ACPI reservations on
>> these broken boxes.. Does someone have a full dmesg from one on a recent
>> kernel that shows all the pnpacpi resource reservation output?
>> -If these checks fail, we don't use the table, and the chipset is known,
>> we should likely try to disable decoding of the region so that it won't
>> get in the way of anything else.
> 
> Yeah, that sounds like a good algorithm.
> 
> I'm not sure how to handle the fact that we don't have access to the _CRS 
> until late in boot though...  Len?

We'd likely have to split the MMCONFIG initialization into two parts. 
The early part enables MMCONFIG only on systems where we require it 
(like the Macs that Andi mentioned). On all other systems we defer 
enabling it (use the regular PCI configuration mechanism) until the 
second part, after the ACPI interpreter is enabled, where we can poke 
around in ACPI and verify the table is suitable.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-04-29 18:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-19  1:25 PCI Express MMCONFIG and BIOS Bug messages Robert Hancock
2007-04-19 16:05 ` Chuck Ebbert
2007-04-29  1:22   ` Robert Hancock
2007-04-29 10:12     ` Andi Kleen
2007-04-29 18:20       ` Robert Hancock
2007-04-29 18:27         ` Jesse Barnes
2007-04-29 18:39           ` Robert Hancock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox