* PCI Express MMCONFIG and BIOS Bug messages..
@ 2007-04-19 1:25 Robert Hancock
2007-04-19 16:05 ` Chuck Ebbert
0 siblings, 1 reply; 7+ messages in thread
From: Robert Hancock @ 2007-04-19 1:25 UTC (permalink / raw)
To: linux-kernel; +Cc: Andi Kleen
I've seen a lot of systems (including brand new Xeon-based servers from
IBM and HP) that output messages on boot like:
PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
PCI: Not using MMCONFIG.
As I understand it, this is sort of a sanity check mechanism to make
sure the MCFG address reported is remotely reasonable and intended to be
used as such. Problem is, I doubt the BIOS authors would agree that this
constitutes a bug. Microsoft is providing a lot of the direction for
BIOS writers, and have a look at this presentation "PCI Express,
Windows, And The Legacy Transition" from back in 2004:
http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/TW04047_WINHEC2004.ppt
On page 14, "Existing Windows - Reserve MMCONFIG":
Existing Windows versions won’t understand MCFG table
* Backwards-compatible range reservation must be used
Report range in ACPI "Motherboard Resources"
*_CRS of PNP0C02 node
* PNP0C02 must be at \_SB scope
* Range must be marked as consumed
Do not include range in _CRS of PCI root bus
* If included, OS will assume that this range can be allocated to devices
E820 table/EFI memory map
* Not necessary to describe MMConfig here
* For Windows, these are used to describe RAM
* No harm in including range as reserved either
So Microsoft is explicitly telling the BIOS developers that there is no
need to reserve the MMCONFIG space in the E820 table because Windows
doesn't care. On that basis it doesn't seem like a valid check to
require it to be so reserved, then.
Really, I think we should be basing this check on whether the
corresponding memory range is reserved in the ACPI resources, like
Windows expects. This does require putting more fingers into ACPI from
this early boot stage, though..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PCI Express MMCONFIG and BIOS Bug messages..
2007-04-19 1:25 PCI Express MMCONFIG and BIOS Bug messages Robert Hancock
@ 2007-04-19 16:05 ` Chuck Ebbert
2007-04-29 1:22 ` Robert Hancock
0 siblings, 1 reply; 7+ messages in thread
From: Chuck Ebbert @ 2007-04-19 16:05 UTC (permalink / raw)
To: Robert Hancock; +Cc: linux-kernel, Andi Kleen
Robert Hancock wrote:
> I've seen a lot of systems (including brand new Xeon-based servers from
> IBM and HP) that output messages on boot like:
>
> PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
> PCI: Not using MMCONFIG.
>
>
> So Microsoft is explicitly telling the BIOS developers that there is no
> need to reserve the MMCONFIG space in the E820 table because Windows
> doesn't care. On that basis it doesn't seem like a valid check to
> require it to be so reserved, then.
>
> Really, I think we should be basing this check on whether the
> corresponding memory range is reserved in the ACPI resources, like
> Windows expects. This does require putting more fingers into ACPI from
> this early boot stage, though..
>
Intel had posted patches to do exactly that, but they were rejected.
I don't remember why now...
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PCI Express MMCONFIG and BIOS Bug messages..
2007-04-19 16:05 ` Chuck Ebbert
@ 2007-04-29 1:22 ` Robert Hancock
2007-04-29 10:12 ` Andi Kleen
0 siblings, 1 reply; 7+ messages in thread
From: Robert Hancock @ 2007-04-29 1:22 UTC (permalink / raw)
To: Chuck Ebbert; +Cc: linux-kernel, Andi Kleen, Len Brown, linux-acpi
Chuck Ebbert wrote:
> Robert Hancock wrote:
>> I've seen a lot of systems (including brand new Xeon-based servers from
>> IBM and HP) that output messages on boot like:
>>
>> PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
>> PCI: Not using MMCONFIG.
>>
>>
>> So Microsoft is explicitly telling the BIOS developers that there is no
>> need to reserve the MMCONFIG space in the E820 table because Windows
>> doesn't care. On that basis it doesn't seem like a valid check to
>> require it to be so reserved, then.
>>
>> Really, I think we should be basing this check on whether the
>> corresponding memory range is reserved in the ACPI resources, like
>> Windows expects. This does require putting more fingers into ACPI from
>> this early boot stage, though..
>>
>
> Intel had posted patches to do exactly that, but they were rejected.
> I don't remember why now...
I tried adapting a patch by Rajesh Shah to do this for current kernels:
http://lkml.org/lkml/2006/6/23/365
It walks through all the motherboard resource devices and tries to pull
out the resource settings for all of them using the _CRS method.
(Depending on how you do the probing, the _STA method is called as well,
either before or after.) From my limited ACPI knowledge, the problem is
that the PCI MMCONFIG initialization is called before the main ACPI
interpreter is enabled, and these control methods may try to access
operation regions who don't have handlers set up for them yet, so a
bunch of "no handler for region" errors show up.
I think some earlier version of this patch was in -mm for a while back
in 2.6.18 times, I actually complained about it back then because it
falsely detected the region wasn't reserved on my system since it bailed
out on the first such error before it found the reservation. On my
system it turns out that the device called EXPL that has the MCFG table
reservation in it has the addresses statically defined in the _CRS
method and doesn't need to access any regions, so if you make the search
continue on after errors, it does actually work, but there's probably no
guarantee that all systems will have the MCFG reservation statically
defined like this, and we can't have all those ACPI errors from other
devices clogging the logs either.
So essentially if we want to do this check based on ACPI resource
reservations, we need to be able to execute control methods at the point
that MMCONFIG is set up. Is there a reason why this can't be made
possible (like by moving the necessary parts of ACPI initialization
earlier)?
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PCI Express MMCONFIG and BIOS Bug messages..
2007-04-29 1:22 ` Robert Hancock
@ 2007-04-29 10:12 ` Andi Kleen
2007-04-29 18:20 ` Robert Hancock
0 siblings, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2007-04-29 10:12 UTC (permalink / raw)
To: Robert Hancock; +Cc: Chuck Ebbert, linux-kernel, Len Brown, linux-acpi
>
> I tried adapting a patch by Rajesh Shah to do this for current kernels:
The Intel patches checked against ACPI which also didn't work in all cases.
You're right the e820 check is overzealous and has a lot of false positives,
but it is the only generic way we know right now to handle a common i965 BIOS
bug. Also there is the nasty case of the Apple EFI boxes where only mmconfig
works which has to be handled too.
I expect eventually the logic to be:
- If we know the hardware: read it from hw registers; trust them; ignore BIOS.
- Otherwise check e820 and ACPI resources and be very trigger happy at not using
it
> It walks through all the motherboard resource devices and tries to pull
> out the resource settings for all of them using the _CRS method.
I tested it originally on a Intel system with the above BIOS problem
and it didn't help there.
> (Depending on how you do the probing, the _STA method is called as well,
> either before or after.) From my limited ACPI knowledge, the problem is
> that the PCI MMCONFIG initialization is called before the main ACPI
> interpreter is enabled, and these control methods may try to access
> operation regions who don't have handlers set up for them yet, so a
> bunch of "no handler for region" errors show up.
mmconfig access can be switched later without problems; so it would
be possible to boot using Type1 if it works (e.g. detect the Apple case)
and switch later.
It's all quite tricky unfortunately; that is why i left it at the current
relatively safe state for now. After all mmconfig is normally not needed.
> So essentially if we want to do this check based on ACPI resource
> reservations, we need to be able to execute control methods at the point
> that MMCONFIG is set up. Is there a reason why this can't be made
> possible (like by moving the necessary parts of ACPI initialization
> earlier)?
ACPI Interpreter wants to allocate memory and use other kernel services that
are not available in really early boot. It could be probably done somehow,
but would be quite ugly with lots of special cases.
-Andi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PCI Express MMCONFIG and BIOS Bug messages..
2007-04-29 10:12 ` Andi Kleen
@ 2007-04-29 18:20 ` Robert Hancock
2007-04-29 18:27 ` Jesse Barnes
0 siblings, 1 reply; 7+ messages in thread
From: Robert Hancock @ 2007-04-29 18:20 UTC (permalink / raw)
To: Andi Kleen
Cc: Chuck Ebbert, linux-kernel, Len Brown, linux-acpi, Jesse Barnes
Andi Kleen wrote:
>> I tried adapting a patch by Rajesh Shah to do this for current kernels:
>
> The Intel patches checked against ACPI which also didn't work in all cases.
>
> You're right the e820 check is overzealous and has a lot of false positives,
> but it is the only generic way we know right now to handle a common i965 BIOS
> bug. Also there is the nasty case of the Apple EFI boxes where only mmconfig
> works which has to be handled too.
>
> I expect eventually the logic to be:
>
> - If we know the hardware: read it from hw registers; trust them; ignore BIOS.
> - Otherwise check e820 and ACPI resources and be very trigger happy at not using
> it
Problem is that even if we read the MMCONFIG table location from the
hardware registers, that doesn't mean we can trust the result. It could
be that the BIOS hasn't lied about where it put the table, it just stuck
it someplace completely unsuitable like on top of RAM or other
registers. It seems that with some of those 965 chipsets the latter is
what the BIOS is actually doing, and so when we think we're writing to
the table we're really writing to random chipset registers and hosing
things. (Jesse Barnes ran into this while trying to add chipset support
for the 965).
Likely what we need to do is:
-If chipset is known, take table address from registers, otherwise check
the MCFG table
-Take the resulting area (Ideally not just the first minimum part as we
check now, but the full area based on the expected length) and make sure
that the entire area is covered by a reservation in ACPI motherboard
resources.
-If that passes, then we still need to sanity check the result by making
sure it hasn't been mapped over top of something else important. How to
do this depends on exactly how they've set up the ACPI reservations on
these broken boxes.. Does someone have a full dmesg from one on a recent
kernel that shows all the pnpacpi resource reservation output?
-If these checks fail, we don't use the table, and the chipset is known,
we should likely try to disable decoding of the region so that it won't
get in the way of anything else.
The current check we have really should go, though. It only excludes
these broken chipsets based on luck, not on anything that is guaranteed,
and ends up disabling the table on systems where it's perfectly functional.
>
>> It walks through all the motherboard resource devices and tries to pull
>> out the resource settings for all of them using the _CRS method.
>
> I tested it originally on a Intel system with the above BIOS problem
> and it didn't help there.
>
>> (Depending on how you do the probing, the _STA method is called as well,
>> either before or after.) From my limited ACPI knowledge, the problem is
>> that the PCI MMCONFIG initialization is called before the main ACPI
>> interpreter is enabled, and these control methods may try to access
>> operation regions who don't have handlers set up for them yet, so a
>> bunch of "no handler for region" errors show up.
>
> mmconfig access can be switched later without problems; so it would
> be possible to boot using Type1 if it works (e.g. detect the Apple case)
> and switch later.
>
> It's all quite tricky unfortunately; that is why i left it at the current
> relatively safe state for now. After all mmconfig is normally not needed.
>
>> So essentially if we want to do this check based on ACPI resource
>> reservations, we need to be able to execute control methods at the point
>> that MMCONFIG is set up. Is there a reason why this can't be made
>> possible (like by moving the necessary parts of ACPI initialization
>> earlier)?
>
> ACPI Interpreter wants to allocate memory and use other kernel services that
> are not available in really early boot. It could be probably done somehow,
> but would be quite ugly with lots of special cases.
Yeah, if we can do this part of MMCONFIG initialization later that would
likely be a better solution.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PCI Express MMCONFIG and BIOS Bug messages..
2007-04-29 18:20 ` Robert Hancock
@ 2007-04-29 18:27 ` Jesse Barnes
2007-04-29 18:39 ` Robert Hancock
0 siblings, 1 reply; 7+ messages in thread
From: Jesse Barnes @ 2007-04-29 18:27 UTC (permalink / raw)
To: Robert Hancock
Cc: Andi Kleen, Chuck Ebbert, linux-kernel, Len Brown, linux-acpi
On Sunday, April 29, 2007, Robert Hancock wrote:
> Problem is that even if we read the MMCONFIG table location from the
> hardware registers, that doesn't mean we can trust the result. It could
> be that the BIOS hasn't lied about where it put the table, it just stuck
> it someplace completely unsuitable like on top of RAM or other
> registers. It seems that with some of those 965 chipsets the latter is
> what the BIOS is actually doing, and so when we think we're writing to
> the table we're really writing to random chipset registers and hosing
> things. (Jesse Barnes ran into this while trying to add chipset support
> for the 965).
Right, I've updated the BIOS since, but at least that version was totally
buggy wrt MMconfig support. I haven't yet looked at the new one to see if
it properly reserves MCFG space in ACPI _CRS yet or properly programs it.
> Likely what we need to do is:
>
> -If chipset is known, take table address from registers, otherwise check
> the MCFG table
> -Take the resulting area (Ideally not just the first minimum part as we
> check now, but the full area based on the expected length) and make sure
> that the entire area is covered by a reservation in ACPI motherboard
> resources.
> -If that passes, then we still need to sanity check the result by making
> sure it hasn't been mapped over top of something else important. How to
> do this depends on exactly how they've set up the ACPI reservations on
> these broken boxes.. Does someone have a full dmesg from one on a recent
> kernel that shows all the pnpacpi resource reservation output?
> -If these checks fail, we don't use the table, and the chipset is known,
> we should likely try to disable decoding of the region so that it won't
> get in the way of anything else.
Yeah, that sounds like a good algorithm.
I'm not sure how to handle the fact that we don't have access to the _CRS
until late in boot though... Len?
Jesse
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PCI Express MMCONFIG and BIOS Bug messages..
2007-04-29 18:27 ` Jesse Barnes
@ 2007-04-29 18:39 ` Robert Hancock
0 siblings, 0 replies; 7+ messages in thread
From: Robert Hancock @ 2007-04-29 18:39 UTC (permalink / raw)
To: Jesse Barnes
Cc: Andi Kleen, Chuck Ebbert, linux-kernel, Len Brown, linux-acpi
Jesse Barnes wrote:
> On Sunday, April 29, 2007, Robert Hancock wrote:
>> Problem is that even if we read the MMCONFIG table location from the
>> hardware registers, that doesn't mean we can trust the result. It could
>> be that the BIOS hasn't lied about where it put the table, it just stuck
>> it someplace completely unsuitable like on top of RAM or other
>> registers. It seems that with some of those 965 chipsets the latter is
>> what the BIOS is actually doing, and so when we think we're writing to
>> the table we're really writing to random chipset registers and hosing
>> things. (Jesse Barnes ran into this while trying to add chipset support
>> for the 965).
>
> Right, I've updated the BIOS since, but at least that version was totally
> buggy wrt MMconfig support. I haven't yet looked at the new one to see if
> it properly reserves MCFG space in ACPI _CRS yet or properly programs it.
>
>> Likely what we need to do is:
>>
>> -If chipset is known, take table address from registers, otherwise check
>> the MCFG table
>> -Take the resulting area (Ideally not just the first minimum part as we
>> check now, but the full area based on the expected length) and make sure
>> that the entire area is covered by a reservation in ACPI motherboard
>> resources.
>> -If that passes, then we still need to sanity check the result by making
>> sure it hasn't been mapped over top of something else important. How to
>> do this depends on exactly how they've set up the ACPI reservations on
>> these broken boxes.. Does someone have a full dmesg from one on a recent
>> kernel that shows all the pnpacpi resource reservation output?
>> -If these checks fail, we don't use the table, and the chipset is known,
>> we should likely try to disable decoding of the region so that it won't
>> get in the way of anything else.
>
> Yeah, that sounds like a good algorithm.
>
> I'm not sure how to handle the fact that we don't have access to the _CRS
> until late in boot though... Len?
We'd likely have to split the MMCONFIG initialization into two parts.
The early part enables MMCONFIG only on systems where we require it
(like the Macs that Andi mentioned). On all other systems we defer
enabling it (use the regular PCI configuration mechanism) until the
second part, after the ACPI interpreter is enabled, where we can poke
around in ACPI and verify the table is suitable.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-04-29 18:40 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-19 1:25 PCI Express MMCONFIG and BIOS Bug messages Robert Hancock
2007-04-19 16:05 ` Chuck Ebbert
2007-04-29 1:22 ` Robert Hancock
2007-04-29 10:12 ` Andi Kleen
2007-04-29 18:20 ` Robert Hancock
2007-04-29 18:27 ` Jesse Barnes
2007-04-29 18:39 ` Robert Hancock
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox