* PCI Express MMCONFIG and BIOS Bug messages.. @ 2007-04-19 1:25 Robert Hancock 2007-04-19 16:05 ` Chuck Ebbert 0 siblings, 1 reply; 7+ messages in thread From: Robert Hancock @ 2007-04-19 1:25 UTC (permalink / raw) To: linux-kernel; +Cc: Andi Kleen I've seen a lot of systems (including brand new Xeon-based servers from IBM and HP) that output messages on boot like: PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved PCI: Not using MMCONFIG. As I understand it, this is sort of a sanity check mechanism to make sure the MCFG address reported is remotely reasonable and intended to be used as such. Problem is, I doubt the BIOS authors would agree that this constitutes a bug. Microsoft is providing a lot of the direction for BIOS writers, and have a look at this presentation "PCI Express, Windows, And The Legacy Transition" from back in 2004: http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/TW04047_WINHEC2004.ppt On page 14, "Existing Windows - Reserve MMCONFIG": Existing Windows versions won’t understand MCFG table * Backwards-compatible range reservation must be used Report range in ACPI "Motherboard Resources" *_CRS of PNP0C02 node * PNP0C02 must be at \_SB scope * Range must be marked as consumed Do not include range in _CRS of PCI root bus * If included, OS will assume that this range can be allocated to devices E820 table/EFI memory map * Not necessary to describe MMConfig here * For Windows, these are used to describe RAM * No harm in including range as reserved either So Microsoft is explicitly telling the BIOS developers that there is no need to reserve the MMCONFIG space in the E820 table because Windows doesn't care. On that basis it doesn't seem like a valid check to require it to be so reserved, then. Really, I think we should be basing this check on whether the corresponding memory range is reserved in the ACPI resources, like Windows expects. This does require putting more fingers into ACPI from this early boot stage, though.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PCI Express MMCONFIG and BIOS Bug messages.. 2007-04-19 1:25 PCI Express MMCONFIG and BIOS Bug messages Robert Hancock @ 2007-04-19 16:05 ` Chuck Ebbert 2007-04-29 1:22 ` Robert Hancock 0 siblings, 1 reply; 7+ messages in thread From: Chuck Ebbert @ 2007-04-19 16:05 UTC (permalink / raw) To: Robert Hancock; +Cc: linux-kernel, Andi Kleen Robert Hancock wrote: > I've seen a lot of systems (including brand new Xeon-based servers from > IBM and HP) that output messages on boot like: > > PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved > PCI: Not using MMCONFIG. > > > So Microsoft is explicitly telling the BIOS developers that there is no > need to reserve the MMCONFIG space in the E820 table because Windows > doesn't care. On that basis it doesn't seem like a valid check to > require it to be so reserved, then. > > Really, I think we should be basing this check on whether the > corresponding memory range is reserved in the ACPI resources, like > Windows expects. This does require putting more fingers into ACPI from > this early boot stage, though.. > Intel had posted patches to do exactly that, but they were rejected. I don't remember why now... ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PCI Express MMCONFIG and BIOS Bug messages.. 2007-04-19 16:05 ` Chuck Ebbert @ 2007-04-29 1:22 ` Robert Hancock 2007-04-29 10:12 ` Andi Kleen 0 siblings, 1 reply; 7+ messages in thread From: Robert Hancock @ 2007-04-29 1:22 UTC (permalink / raw) To: Chuck Ebbert; +Cc: linux-kernel, Andi Kleen, Len Brown, linux-acpi Chuck Ebbert wrote: > Robert Hancock wrote: >> I've seen a lot of systems (including brand new Xeon-based servers from >> IBM and HP) that output messages on boot like: >> >> PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved >> PCI: Not using MMCONFIG. >> >> >> So Microsoft is explicitly telling the BIOS developers that there is no >> need to reserve the MMCONFIG space in the E820 table because Windows >> doesn't care. On that basis it doesn't seem like a valid check to >> require it to be so reserved, then. >> >> Really, I think we should be basing this check on whether the >> corresponding memory range is reserved in the ACPI resources, like >> Windows expects. This does require putting more fingers into ACPI from >> this early boot stage, though.. >> > > Intel had posted patches to do exactly that, but they were rejected. > I don't remember why now... I tried adapting a patch by Rajesh Shah to do this for current kernels: http://lkml.org/lkml/2006/6/23/365 It walks through all the motherboard resource devices and tries to pull out the resource settings for all of them using the _CRS method. (Depending on how you do the probing, the _STA method is called as well, either before or after.) From my limited ACPI knowledge, the problem is that the PCI MMCONFIG initialization is called before the main ACPI interpreter is enabled, and these control methods may try to access operation regions who don't have handlers set up for them yet, so a bunch of "no handler for region" errors show up. I think some earlier version of this patch was in -mm for a while back in 2.6.18 times, I actually complained about it back then because it falsely detected the region wasn't reserved on my system since it bailed out on the first such error before it found the reservation. On my system it turns out that the device called EXPL that has the MCFG table reservation in it has the addresses statically defined in the _CRS method and doesn't need to access any regions, so if you make the search continue on after errors, it does actually work, but there's probably no guarantee that all systems will have the MCFG reservation statically defined like this, and we can't have all those ACPI errors from other devices clogging the logs either. So essentially if we want to do this check based on ACPI resource reservations, we need to be able to execute control methods at the point that MMCONFIG is set up. Is there a reason why this can't be made possible (like by moving the necessary parts of ACPI initialization earlier)? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PCI Express MMCONFIG and BIOS Bug messages.. 2007-04-29 1:22 ` Robert Hancock @ 2007-04-29 10:12 ` Andi Kleen 2007-04-29 18:20 ` Robert Hancock 0 siblings, 1 reply; 7+ messages in thread From: Andi Kleen @ 2007-04-29 10:12 UTC (permalink / raw) To: Robert Hancock; +Cc: Chuck Ebbert, linux-kernel, Len Brown, linux-acpi > > I tried adapting a patch by Rajesh Shah to do this for current kernels: The Intel patches checked against ACPI which also didn't work in all cases. You're right the e820 check is overzealous and has a lot of false positives, but it is the only generic way we know right now to handle a common i965 BIOS bug. Also there is the nasty case of the Apple EFI boxes where only mmconfig works which has to be handled too. I expect eventually the logic to be: - If we know the hardware: read it from hw registers; trust them; ignore BIOS. - Otherwise check e820 and ACPI resources and be very trigger happy at not using it > It walks through all the motherboard resource devices and tries to pull > out the resource settings for all of them using the _CRS method. I tested it originally on a Intel system with the above BIOS problem and it didn't help there. > (Depending on how you do the probing, the _STA method is called as well, > either before or after.) From my limited ACPI knowledge, the problem is > that the PCI MMCONFIG initialization is called before the main ACPI > interpreter is enabled, and these control methods may try to access > operation regions who don't have handlers set up for them yet, so a > bunch of "no handler for region" errors show up. mmconfig access can be switched later without problems; so it would be possible to boot using Type1 if it works (e.g. detect the Apple case) and switch later. It's all quite tricky unfortunately; that is why i left it at the current relatively safe state for now. After all mmconfig is normally not needed. > So essentially if we want to do this check based on ACPI resource > reservations, we need to be able to execute control methods at the point > that MMCONFIG is set up. Is there a reason why this can't be made > possible (like by moving the necessary parts of ACPI initialization > earlier)? ACPI Interpreter wants to allocate memory and use other kernel services that are not available in really early boot. It could be probably done somehow, but would be quite ugly with lots of special cases. -Andi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PCI Express MMCONFIG and BIOS Bug messages.. 2007-04-29 10:12 ` Andi Kleen @ 2007-04-29 18:20 ` Robert Hancock 2007-04-29 18:27 ` Jesse Barnes 0 siblings, 1 reply; 7+ messages in thread From: Robert Hancock @ 2007-04-29 18:20 UTC (permalink / raw) To: Andi Kleen Cc: Chuck Ebbert, linux-kernel, Len Brown, linux-acpi, Jesse Barnes Andi Kleen wrote: >> I tried adapting a patch by Rajesh Shah to do this for current kernels: > > The Intel patches checked against ACPI which also didn't work in all cases. > > You're right the e820 check is overzealous and has a lot of false positives, > but it is the only generic way we know right now to handle a common i965 BIOS > bug. Also there is the nasty case of the Apple EFI boxes where only mmconfig > works which has to be handled too. > > I expect eventually the logic to be: > > - If we know the hardware: read it from hw registers; trust them; ignore BIOS. > - Otherwise check e820 and ACPI resources and be very trigger happy at not using > it Problem is that even if we read the MMCONFIG table location from the hardware registers, that doesn't mean we can trust the result. It could be that the BIOS hasn't lied about where it put the table, it just stuck it someplace completely unsuitable like on top of RAM or other registers. It seems that with some of those 965 chipsets the latter is what the BIOS is actually doing, and so when we think we're writing to the table we're really writing to random chipset registers and hosing things. (Jesse Barnes ran into this while trying to add chipset support for the 965). Likely what we need to do is: -If chipset is known, take table address from registers, otherwise check the MCFG table -Take the resulting area (Ideally not just the first minimum part as we check now, but the full area based on the expected length) and make sure that the entire area is covered by a reservation in ACPI motherboard resources. -If that passes, then we still need to sanity check the result by making sure it hasn't been mapped over top of something else important. How to do this depends on exactly how they've set up the ACPI reservations on these broken boxes.. Does someone have a full dmesg from one on a recent kernel that shows all the pnpacpi resource reservation output? -If these checks fail, we don't use the table, and the chipset is known, we should likely try to disable decoding of the region so that it won't get in the way of anything else. The current check we have really should go, though. It only excludes these broken chipsets based on luck, not on anything that is guaranteed, and ends up disabling the table on systems where it's perfectly functional. > >> It walks through all the motherboard resource devices and tries to pull >> out the resource settings for all of them using the _CRS method. > > I tested it originally on a Intel system with the above BIOS problem > and it didn't help there. > >> (Depending on how you do the probing, the _STA method is called as well, >> either before or after.) From my limited ACPI knowledge, the problem is >> that the PCI MMCONFIG initialization is called before the main ACPI >> interpreter is enabled, and these control methods may try to access >> operation regions who don't have handlers set up for them yet, so a >> bunch of "no handler for region" errors show up. > > mmconfig access can be switched later without problems; so it would > be possible to boot using Type1 if it works (e.g. detect the Apple case) > and switch later. > > It's all quite tricky unfortunately; that is why i left it at the current > relatively safe state for now. After all mmconfig is normally not needed. > >> So essentially if we want to do this check based on ACPI resource >> reservations, we need to be able to execute control methods at the point >> that MMCONFIG is set up. Is there a reason why this can't be made >> possible (like by moving the necessary parts of ACPI initialization >> earlier)? > > ACPI Interpreter wants to allocate memory and use other kernel services that > are not available in really early boot. It could be probably done somehow, > but would be quite ugly with lots of special cases. Yeah, if we can do this part of MMCONFIG initialization later that would likely be a better solution. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PCI Express MMCONFIG and BIOS Bug messages.. 2007-04-29 18:20 ` Robert Hancock @ 2007-04-29 18:27 ` Jesse Barnes 2007-04-29 18:39 ` Robert Hancock 0 siblings, 1 reply; 7+ messages in thread From: Jesse Barnes @ 2007-04-29 18:27 UTC (permalink / raw) To: Robert Hancock Cc: Andi Kleen, Chuck Ebbert, linux-kernel, Len Brown, linux-acpi On Sunday, April 29, 2007, Robert Hancock wrote: > Problem is that even if we read the MMCONFIG table location from the > hardware registers, that doesn't mean we can trust the result. It could > be that the BIOS hasn't lied about where it put the table, it just stuck > it someplace completely unsuitable like on top of RAM or other > registers. It seems that with some of those 965 chipsets the latter is > what the BIOS is actually doing, and so when we think we're writing to > the table we're really writing to random chipset registers and hosing > things. (Jesse Barnes ran into this while trying to add chipset support > for the 965). Right, I've updated the BIOS since, but at least that version was totally buggy wrt MMconfig support. I haven't yet looked at the new one to see if it properly reserves MCFG space in ACPI _CRS yet or properly programs it. > Likely what we need to do is: > > -If chipset is known, take table address from registers, otherwise check > the MCFG table > -Take the resulting area (Ideally not just the first minimum part as we > check now, but the full area based on the expected length) and make sure > that the entire area is covered by a reservation in ACPI motherboard > resources. > -If that passes, then we still need to sanity check the result by making > sure it hasn't been mapped over top of something else important. How to > do this depends on exactly how they've set up the ACPI reservations on > these broken boxes.. Does someone have a full dmesg from one on a recent > kernel that shows all the pnpacpi resource reservation output? > -If these checks fail, we don't use the table, and the chipset is known, > we should likely try to disable decoding of the region so that it won't > get in the way of anything else. Yeah, that sounds like a good algorithm. I'm not sure how to handle the fact that we don't have access to the _CRS until late in boot though... Len? Jesse ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PCI Express MMCONFIG and BIOS Bug messages.. 2007-04-29 18:27 ` Jesse Barnes @ 2007-04-29 18:39 ` Robert Hancock 0 siblings, 0 replies; 7+ messages in thread From: Robert Hancock @ 2007-04-29 18:39 UTC (permalink / raw) To: Jesse Barnes Cc: Andi Kleen, Chuck Ebbert, linux-kernel, Len Brown, linux-acpi Jesse Barnes wrote: > On Sunday, April 29, 2007, Robert Hancock wrote: >> Problem is that even if we read the MMCONFIG table location from the >> hardware registers, that doesn't mean we can trust the result. It could >> be that the BIOS hasn't lied about where it put the table, it just stuck >> it someplace completely unsuitable like on top of RAM or other >> registers. It seems that with some of those 965 chipsets the latter is >> what the BIOS is actually doing, and so when we think we're writing to >> the table we're really writing to random chipset registers and hosing >> things. (Jesse Barnes ran into this while trying to add chipset support >> for the 965). > > Right, I've updated the BIOS since, but at least that version was totally > buggy wrt MMconfig support. I haven't yet looked at the new one to see if > it properly reserves MCFG space in ACPI _CRS yet or properly programs it. > >> Likely what we need to do is: >> >> -If chipset is known, take table address from registers, otherwise check >> the MCFG table >> -Take the resulting area (Ideally not just the first minimum part as we >> check now, but the full area based on the expected length) and make sure >> that the entire area is covered by a reservation in ACPI motherboard >> resources. >> -If that passes, then we still need to sanity check the result by making >> sure it hasn't been mapped over top of something else important. How to >> do this depends on exactly how they've set up the ACPI reservations on >> these broken boxes.. Does someone have a full dmesg from one on a recent >> kernel that shows all the pnpacpi resource reservation output? >> -If these checks fail, we don't use the table, and the chipset is known, >> we should likely try to disable decoding of the region so that it won't >> get in the way of anything else. > > Yeah, that sounds like a good algorithm. > > I'm not sure how to handle the fact that we don't have access to the _CRS > until late in boot though... Len? We'd likely have to split the MMCONFIG initialization into two parts. The early part enables MMCONFIG only on systems where we require it (like the Macs that Andi mentioned). On all other systems we defer enabling it (use the regular PCI configuration mechanism) until the second part, after the ACPI interpreter is enabled, where we can poke around in ACPI and verify the table is suitable. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-04-29 18:40 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-04-19 1:25 PCI Express MMCONFIG and BIOS Bug messages Robert Hancock 2007-04-19 16:05 ` Chuck Ebbert 2007-04-29 1:22 ` Robert Hancock 2007-04-29 10:12 ` Andi Kleen 2007-04-29 18:20 ` Robert Hancock 2007-04-29 18:27 ` Jesse Barnes 2007-04-29 18:39 ` Robert Hancock
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox