* PCI resource problems caused by improper address rounding
@ 2007-12-18 0:25 Chuck Ebbert
2007-12-18 0:57 ` Linus Torvalds
2007-12-22 9:22 ` Andrew Morton
0 siblings, 2 replies; 33+ messages in thread
From: Chuck Ebbert @ 2007-12-18 0:25 UTC (permalink / raw)
To: linux-kernel; +Cc: Ivan Kokshaysky, Linus Torvalds, Daniel Ritz, Greg KH
Looks like a commit that I can't find in git due to the arch merge
has broken PCI address assignment. This patch by Richard Henderson
against 2.6.23 fixes it for x86_64:
--- linux-2.6.23.x86_64/arch/x86_64/kernel/e820.c 2007-10-09 13:31:38.000000000 -0700
+++ linux-2.6.23.x86_64-rth/arch/x86_64/kernel/e820.c 2007-12-15 12:37:44.000000000 -0800
@@ -718,8 +718,8 @@ __init void e820_setup_gap(void)
while ((gapsize >> 4) > round)
round += round;
/* Fun with two's complement */
- pci_mem_start = (gapstart + round) & -round;
+ pci_mem_start = (gapstart + round - 1) & -round;
printk(KERN_INFO "Allocating PCI resources starting at %lx (gap: %lx:%lx)\n",
pci_mem_start, gapstart, gapsize);
Here is the original changeset, taken from the Mercurial repo. It was
merged in 2.6.14:
# HG changeset patch
# User Daniel Ritz <daniel.ritz@gmx.ch>
# Date 1126304746 -700
# Node ID 51367d6e0b839be0b425a8f67c29f625b670f126
# Parent f4852c862b04efc9f8e2c7913191f5f7d140d895
[PATCH] Update PCI IOMEM allocation start
This fixes the problem with "Averatec 6240 pcmcia_socket0: unable to
apply power", which was due to the CardBus IOMEM register region being
allocated at an address that was actually inside the RAM window that had
been reserved for video frame-buffers in an UMA setup.
The BIOS _should_ have marked that region reserved in the e820 memory
descriptor tables, but did not.
It is fixed by rounding up the default starting address of PCI memory
allocations, so that we leave a bigger gap after the final known memory
location. The amount of rounding depends on how big the unused memory
gap is that we can allocate IOMEM from.
Based on example code by Linus.
Acked-by: Greg KH <greg@kroah.com>
Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
committer: Linus Torvalds <torvalds@g5.osdl.org> 1126304746 -0700
--- a/arch/i386/kernel/setup.c Fri Sep 09 22:28:40 2005 +0011
+++ b/arch/i386/kernel/setup.c Fri Sep 09 22:37:26 2005 +0011
@@ -1300,7 +1300,7 @@ legacy_init_iomem_resources(struct resou
*/
static void __init register_memory(void)
{
- unsigned long gapstart, gapsize;
+ unsigned long gapstart, gapsize, round;
unsigned long long last;
int i;
@@ -1345,14 +1345,14 @@ static void __init register_memory(void)
}
/*
- * Start allocating dynamic PCI memory a bit into the gap,
- * aligned up to the nearest megabyte.
- *
- * Question: should we try to pad it up a bit (do something
- * like " + (gapsize >> 3)" in there too?). We now have the
- * technology.
+ * See how much we want to round up: start off with
+ * rounding to the next 1MB area.
*/
- pci_mem_start = (gapstart + 0xfffff) & ~0xfffff;
+ round = 0x100000;
+ while ((gapsize >> 4) > round)
+ round += round;
+ /* Fun with two's complement */
+ pci_mem_start = (gapstart + round) & -round;
printk("Allocating PCI resources starting at %08lx (gap: %08lx:%08lx)\n",
pci_mem_start, gapstart, gapsize);
--- a/arch/x86_64/kernel/e820.c Fri Sep 09 22:28:40 2005 +0011
+++ b/arch/x86_64/kernel/e820.c Fri Sep 09 22:37:26 2005 +0011
@@ -567,7 +567,7 @@ unsigned long pci_mem_start = 0xaeedbabe
*/
__init void e820_setup_gap(void)
{
- unsigned long gapstart, gapsize;
+ unsigned long gapstart, gapsize, round;
unsigned long last;
int i;
int found = 0;
@@ -604,14 +604,14 @@ __init void e820_setup_gap(void)
}
/*
- * Start allocating dynamic PCI memory a bit into the gap,
- * aligned up to the nearest megabyte.
- *
- * Question: should we try to pad it up a bit (do something
- * like " + (gapsize >> 3)" in there too?). We now have the
- * technology.
+ * See how much we want to round up: start off with
+ * rounding to the next 1MB area.
*/
- pci_mem_start = (gapstart + 0xfffff) & ~0xfffff;
+ round = 0x100000;
+ while ((gapsize >> 4) > round)
+ round += round;
+ /* Fun with two's complement */
+ pci_mem_start = (gapstart + round) & -round;
printk(KERN_INFO "Allocating PCI resources starting at %lx (gap: %lx:%lx)\n",
pci_mem_start, gapstart, gapsize);
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: PCI resource problems caused by improper address rounding 2007-12-18 0:25 PCI resource problems caused by improper address rounding Chuck Ebbert @ 2007-12-18 0:57 ` Linus Torvalds 2007-12-18 17:34 ` Chuck Ebbert 2007-12-22 9:22 ` Andrew Morton 1 sibling, 1 reply; 33+ messages in thread From: Linus Torvalds @ 2007-12-18 0:57 UTC (permalink / raw) To: Chuck Ebbert; +Cc: linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH On Mon, 17 Dec 2007, Chuck Ebbert wrote: > > Looks like a commit that I can't find in git due to the arch merge > has broken PCI address assignment. This patch by Richard Henderson > against 2.6.23 fixes it for x86_64: > > --- linux-2.6.23.x86_64/arch/x86_64/kernel/e820.c 2007-10-09 13:31:38.000000000 -0700 > +++ linux-2.6.23.x86_64-rth/arch/x86_64/kernel/e820.c 2007-12-15 12:37:44.000000000 -0800 > @@ -718,8 +718,8 @@ __init void e820_setup_gap(void) > while ((gapsize >> 4) > round) > round += round; > /* Fun with two's complement */ > - pci_mem_start = (gapstart + round) & -round; > + pci_mem_start = (gapstart + round - 1) & -round; No, it's very much meant to be that way. We do *not* want to have the PCI memory abutthe end of memory exactly. So it leaves a gap in between "gapstart" and the actual start of PCI memory addressing very much on purpose. In fact, the very commit (it's f0eca9626c6becb6fc56106b2e4287c6c784af3d in the kernel tree) you mention actually explicitly *explains* that, although maybe it's a bit indirect: if you start allocating PCI resources directly after the end-of-RAM thing, you can easily end up using addresses that are actually inside the magic stolen system RAM that is being used for UMA video etc. So you very much want to have a buffer in between the end-of-RAM and the actual start of the region we try to allocate in. So why do you want them to be close, anyway? Linus PS. On a different topic: if you do git log --follow arch/x86/kernel/e820_64.c you'd see the history past the renames in git. Or just do a "git blame -C" which will also follow renames (and copies). ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 0:57 ` Linus Torvalds @ 2007-12-18 17:34 ` Chuck Ebbert 2007-12-18 18:21 ` Linus Torvalds 0 siblings, 1 reply; 33+ messages in thread From: Chuck Ebbert @ 2007-12-18 17:34 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH On 12/17/2007 07:57 PM, Linus Torvalds wrote: > > On Mon, 17 Dec 2007, Chuck Ebbert wrote: >> Looks like a commit that I can't find in git due to the arch merge >> has broken PCI address assignment. This patch by Richard Henderson >> against 2.6.23 fixes it for x86_64: >> >> --- linux-2.6.23.x86_64/arch/x86_64/kernel/e820.c 2007-10-09 13:31:38.000000000 -0700 >> +++ linux-2.6.23.x86_64-rth/arch/x86_64/kernel/e820.c 2007-12-15 12:37:44.000000000 -0800 >> @@ -718,8 +718,8 @@ __init void e820_setup_gap(void) >> while ((gapsize >> 4) > round) >> round += round; >> /* Fun with two's complement */ >> - pci_mem_start = (gapstart + round) & -round; >> + pci_mem_start = (gapstart + round - 1) & -round; > > No, it's very much meant to be that way. > > We do *not* want to have the PCI memory abutthe end of memory exactly. So > it leaves a gap in between "gapstart" and the actual start of PCI memory > addressing very much on purpose. > > In fact, the very commit (it's f0eca9626c6becb6fc56106b2e4287c6c784af3d in > the kernel tree) you mention actually explicitly *explains* that, although > maybe it's a bit indirect: if you start allocating PCI resources directly > after the end-of-RAM thing, you can easily end up using addresses that are > actually inside the magic stolen system RAM that is being used for UMA > video etc. > > So you very much want to have a buffer in between the end-of-RAM and the > actual start of the region we try to allocate in. > > So why do you want them to be close, anyway? > Because otherwise some video adapters with 256MB of memory end up with their resources allocated above 4GB, and that doesn't work very well. https://bugzilla.redhat.com/show_bug.cgi?id=425794#c0 > > PS. On a different topic: if you do > > git log --follow arch/x86/kernel/e820_64.c > > you'd see the history past the renames in git. Or just do a "git blame -C" > which will also follow renames (and copies). The history in the web interface just ends at the rename. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 17:34 ` Chuck Ebbert @ 2007-12-18 18:21 ` Linus Torvalds 2007-12-18 20:22 ` Richard Henderson 0 siblings, 1 reply; 33+ messages in thread From: Linus Torvalds @ 2007-12-18 18:21 UTC (permalink / raw) To: Chuck Ebbert Cc: linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Richard Henderson On Tue, 18 Dec 2007, Chuck Ebbert wrote: > > > > So why do you want them to be close, anyway? > > Because otherwise some video adapters with 256MB of memory end up with their > resources allocated above 4GB, and that doesn't work very well. > > https://bugzilla.redhat.com/show_bug.cgi?id=425794#c0 That bugzilla entry doesn't even have a dmesg output or anything like that. I'd really like to see what the The fact is, that patch is not safe. We very much _want_ to make the PCI region allocator use a window that is in the *middle* of the gap, and not close to either end of the gap, and the code literally tries to make the default start of the PCI allocation gap start be about 1/16th of the actual gap size in question, so that we don't hit BIOS allocations that it didn't tell us about by mistake. But without dmesg and lspci output to see what the actual allocations are, there's no way to even _guess_ at whether there is a correct fix or not, just the fix that totally misses the point of having any rounding-up at all. That patch might as well just do pci_mem_start = gapstart; and get rid of all that rounding code entirely, since the patch just assumes that it's safe to use memory after gapstart (which is known to not be true, and is the whole and only reason for that code in the first place: BIOSes *invariably* get resource allocation wrong, and forget to tell us about some resource they set up). Now, it's entirely possible that the only reasonable end result is that we do have to avoid rounding up that far, but I definitely want to see what the actual resource situation is - that patch is *not* obviously correct, and it definitely breaks the whole point of the code. The *other* patch in the bugzilla entry seems more correct, in that yes, we should make sure that we don't allocate resources over 4G if the resource won't fit. That said, I think that patch is wrong too: we should just fix pcibios_align_resource() to check that case for MEM resouces (the same way it already knows about the magic rules for IO resources). So I'd suggest just fixing pcibios_align_resource() instead. Something like the appended might work (and then you could perhaps quirk it to always clear the PCI_BASE_ADDRESS_MEM_TYPE_64 thing for VGA controllers, although I really don't think the kernel is the right place to do that, and that would be an X server issue!). NOTE! This patch is an independent issue of the whole "what window do we use to allocate new resources, and how do we align it" thing. Linus --- arch/x86/pci/i386.c | 14 ++++++++++++++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c index 42ba0e2..abc642b 100644 --- a/arch/x86/pci/i386.c +++ b/arch/x86/pci/i386.c @@ -70,6 +70,20 @@ pcibios_align_resource(void *data, struct resource *res, start = (start + 0x3ff) & ~0x3ff; res->start = start; } + } else { + u64 max; + switch (res->flags & PCI_BASE_ADDRESS_MEM_MASK) { + case PCI_BASE_ADDRESS_MEM_TYPE_1M: + max = 0xfffff; + break; + case PCI_BASE_ADDRESS_MEM_TYPE_64: + max = -1; + break; + default: + max = 0xffffffff; + } + if (res->start > max) + res->start = res->end; } } ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 18:21 ` Linus Torvalds @ 2007-12-18 20:22 ` Richard Henderson 2007-12-18 21:09 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Richard Henderson @ 2007-12-18 20:22 UTC (permalink / raw) To: Linus Torvalds Cc: Chuck Ebbert, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH On Tue, Dec 18, 2007 at 10:21:50AM -0800, Linus Torvalds wrote: > > https://bugzilla.redhat.com/show_bug.cgi?id=425794#c0 > > That bugzilla entry doesn't even have a dmesg output or anything like > that. I'd really like to see what the I've added dmesg, /proc/iomem, and lspci -v output to that bug. Basically, we have c0000000-cfffffff : free ddf00000-dfefffff : PCI Bus #04 e0000000-efffffff : pnp 00:0b f0000000-fedfffff : less than 256MB The annoying part is that there's no device (that I can see) behind PCI Bus #04, so it might as well be disabled and that entire d0000000-dfffffff area reclaimed. > That patch might as well just do > > pci_mem_start = gapstart; > > and get rid of all that rounding code entirely, since the patch just > assumes that it's safe to use memory after gapstart (which is known to not > be true, and is the whole and only reason for that code in the first > place: BIOSes *invariably* get resource allocation wrong, and forget to > tell us about some resource they set up). That would have been an excellent comment to add to that code then, rather than just "rounding up to the next 1MB area", because purely as rounding code it is erroneous. > The *other* patch in the bugzilla entry seems more correct, in that yes, > we should make sure that we don't allocate resources over 4G if the > resource won't fit. That said, I think that patch is wrong too: we should > just fix pcibios_align_resource() to check that case for MEM resouces (the > same way it already knows about the magic rules for IO resources). I'll give that patch a try, modified a tad to still include the force_32_bit quirk. > So I'd suggest just fixing pcibios_align_resource() instead. Something > like the appended might work (and then you could perhaps quirk it to > always clear the PCI_BASE_ADDRESS_MEM_TYPE_64 thing for VGA controllers, That won't work, because PCI_BASE_ADDRESS_MEM_TYPE_64 controls how many bits need to be written back to the BAR. If we changed that to PCI_BASE_ADDRESS_MEM_TYPE_32, we wouldn't clear the high 32-bits of the BAR. > ... and that would be an X server issue!). Of course, fixing the X server to *handle* 64-bit BARs is the correct solution. I've no idea how involved that is, but I have a sneeking suspicion that it uses that damned CARD32 datatype for everything. r~ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 20:22 ` Richard Henderson @ 2007-12-18 21:09 ` Linus Torvalds 2007-12-18 21:46 ` Chuck Ebbert ` (3 more replies) 2007-12-18 21:23 ` Ivan Kokshaysky 2007-12-20 21:10 ` Benjamin Herrenschmidt 2 siblings, 4 replies; 33+ messages in thread From: Linus Torvalds @ 2007-12-18 21:09 UTC (permalink / raw) To: Richard Henderson Cc: Chuck Ebbert, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas On Tue, 18 Dec 2007, Richard Henderson wrote: > > I've added dmesg, /proc/iomem, and lspci -v output to that bug. > > Basically, we have > > c0000000-cfffffff : free > ddf00000-dfefffff : PCI Bus #04 > e0000000-efffffff : pnp 00:0b > f0000000-fedfffff : less than 256MB Gaah. That really is very unlucky. That 256M only goes at one point in the low 4GB, but the thing is, it fits perfectly well above it, and dammit, that resource is explicitly a 64-bit resource or a really good reason. However, I wonder about that e0000000-efffffff : pnp 00:0b thing. I actually suspect that that whole allocation is literally *meant* for that 256MB graphics aperture, but the kernel explicitly avoids it because it's listed in the PnP tables. I wonder what the heck is the point of that pnp entry. Just for fun, can you try to just disable CONFIG_PNP, and see if it all works then? Björn Helgaas added to Cc to clarify what those pnp entries tend to mean, and whether there is possibly some way to match up a specific pnp entry with the PCI device that might want to use it. Because that is a nice 256MB region that really doesn't seem to make sense for anything else than the graphics buffer - there's nothing else in your system that seems likely (although I guess it could be for some docking port, but even then I'd have expected one of the PCI bridges to map it!) But apart from the question about that pnp 00:0b device, the kernel resource allocation really does look perfectly fine, and while we could shoe-horn it into the low 4GB in this case by just hoping that there is nothing undocumented there (and there probably isn't), it's really annoying considering that big graphics areas are a hell of a good reason to use those 64-bit resources. It's not like 256MB is even as large as they come, half-gig graphics cards are getting to be fairly common at the high end, and X absolutely _has_ to be able to handle a 64-bit address for those. Also, I'm surprised it doesn't work with X already: the ChangeLog for X says that there are "Minor fixes to the handling of 64-bit PCI BARs [..]" in 4.6.99.18, so I'd have assumed that XFree86-4.7.0 should be able to handle this perfectly well. I'll add Keithp to the cc too, to see if the X issues can be clarified. Maybe he can set us right. But maybe you just have an old X server? If so, considering the situation, I really think the kernel has done a good job already, and I'd be *very* nervous about making the kernel allocate new PCI resources right after the end-of-memory thing. I bet it would work in this case, but as mentioned, we definitely know of cases where the BIOS did *not* document the magic memory region that was stolen for UMA graphics, and trying to put PCI devices just after the top of reserved memory in the e820 list causes machines to not work at all because the address decoding will clash. Of course, we could also make the minimum address more of a *hint*, and only make the resource allocator only abut the top-of-known-memory when it absolutely has to, but on the other hand, in this case it really doesn't have to, since there's just _tons_ of space for 64-bit resources. So the correct thing really does seem to be to just use the 64-bit hw that is there. > That would have been an excellent comment to add to that code then, > rather than just "rounding up to the next 1MB area", because purely > as rounding code it is erroneous. Patches to add comments are welcome. There are few enough people who actually work on the PCI resource allocation code these days (I wish there were more), and it's very rare that anybody else than me or Ivan ends up even *looking* at it. So it's not been a big issue. Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 21:09 ` Linus Torvalds @ 2007-12-18 21:46 ` Chuck Ebbert 2007-12-18 21:56 ` Linus Torvalds 2007-12-18 22:17 ` Richard Henderson 2007-12-18 21:51 ` Richard Henderson ` (2 subsequent siblings) 3 siblings, 2 replies; 33+ messages in thread From: Chuck Ebbert @ 2007-12-18 21:46 UTC (permalink / raw) To: Linus Torvalds Cc: Richard Henderson, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas On 12/18/2007 04:09 PM, Linus Torvalds wrote: > > I wonder what the heck is the point of that pnp entry. Just for fun, can > you try to just disable CONFIG_PNP, and see if it all works then? > pnpacpi=off should work. PnP is also trying (and failing) to reserve all physical memory. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 21:46 ` Chuck Ebbert @ 2007-12-18 21:56 ` Linus Torvalds 2007-12-18 22:17 ` Richard Henderson 1 sibling, 0 replies; 33+ messages in thread From: Linus Torvalds @ 2007-12-18 21:56 UTC (permalink / raw) To: Chuck Ebbert Cc: Richard Henderson, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas On Tue, 18 Dec 2007, Chuck Ebbert wrote: > On 12/18/2007 04:09 PM, Linus Torvalds wrote: > > > > I wonder what the heck is the point of that pnp entry. Just for fun, can > > you try to just disable CONFIG_PNP, and see if it all works then? > > pnpacpi=off should work. > > PnP is also trying (and failing) to reserve all physical memory. Yeah, that really is a pretty confused-looking pnp table thing. But I have absolutely zero idea how PnP is even supposed to work - the whole thing is just a total hack for Windows, afaik. The sad part is that *normally* the right thing to do about almost any BIOS information is what we do right now: just avoid that magic address range like the plague, because we have no clue what the heck the BIOS is up to. But it looks like in this particular case, some of the problems may arise exactly *because* we avoid that range. It would be good to know what Windows does. If ACPI is found, does it perhaps just ignore all the PnP entries these days? Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 21:46 ` Chuck Ebbert 2007-12-18 21:56 ` Linus Torvalds @ 2007-12-18 22:17 ` Richard Henderson 1 sibling, 0 replies; 33+ messages in thread From: Richard Henderson @ 2007-12-18 22:17 UTC (permalink / raw) To: Chuck Ebbert Cc: Linus Torvalds, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas On Tue, Dec 18, 2007 at 04:46:09PM -0500, Chuck Ebbert wrote: > pnpacpi=off should work. This does result in the graphics bar being placed at e0000000, and does result in a system lockup when X starts. So it appears as if there's really something there. r~ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 21:09 ` Linus Torvalds 2007-12-18 21:46 ` Chuck Ebbert @ 2007-12-18 21:51 ` Richard Henderson 2007-12-18 22:31 ` Linus Torvalds 2007-12-18 22:16 ` Keith Packard 2007-12-19 0:29 ` Bjorn Helgaas 3 siblings, 1 reply; 33+ messages in thread From: Richard Henderson @ 2007-12-18 21:51 UTC (permalink / raw) To: Linus Torvalds Cc: Chuck Ebbert, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas On Tue, Dec 18, 2007 at 01:09:15PM -0800, Linus Torvalds wrote: > However, I wonder about that > > e0000000-efffffff : pnp 00:0b > > thing. I actually suspect that that whole allocation is literally *meant* > for that 256MB graphics aperture, but the kernel explicitly avoids it > because it's listed in the PnP tables. I assumed it was reserved for the pccard thing, which I don't see listed or allocated anywhere else. > I wonder what the heck is the point of that pnp entry. Just for fun, can > you try to just disable CONFIG_PNP, and see if it all works then? I'll try that, for grins. > I'll add Keithp to the cc too, to see if the X issues can be clarified. > Maybe he can set us right. But maybe you just have an old X server? I've got xorg-x11-server-Xorg-1.3.0.0-36.fc8 installed. I wouldn't have thought that was too old, since Fedora 8 just came out, but it's not like I keep up on these things. I'll give 1.4.99 a try, as that's what's current in Rawhide. > Of course, we could also make the minimum address more of a *hint*, and > only make the resource allocator only abut the top-of-known-memory when it > absolutely has to.... Another way to look at this is that the graphics BAR came in from the BIOS allocated at c0000000, and we ignored that. Perhaps there's a way to give weight to the BIOS settings when consdering where the PCI region is supposed to start? On that system for which there was undeclared resources, did the BIOS avoid that resource for the other PCI resources? I suspect it did... r~ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 21:51 ` Richard Henderson @ 2007-12-18 22:31 ` Linus Torvalds 2007-12-19 1:38 ` Linus Torvalds 0 siblings, 1 reply; 33+ messages in thread From: Linus Torvalds @ 2007-12-18 22:31 UTC (permalink / raw) To: Richard Henderson Cc: Chuck Ebbert, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas On Tue, 18 Dec 2007, Richard Henderson wrote: > > Another way to look at this is that the graphics BAR came in from > the BIOS allocated at c0000000, and we ignored that. We did? > Perhaps there's a way to give weight to the BIOS settings when > consdering where the PCI region is supposed to start? Normally, we *always* keep the BIOS allocations, unless it explicitly clashes with something and we have reason to believe that they cannot work. And there's nothing it clashes with, so we definitely *should* have kept it. Why do you think it came pre-allocated at 0xc0000000? I'm seeing the message PCI: Cannot allocate resource region 1 of device 0000:01:00.0 and I can well imagine that that is it, but if it was a valid allocation, then we really should have kept it there. That question also brings up another issue: how come did we actually choose address 0xc0000000 with the original patch you sent in? If we can't find it in the parent resources, we shouldn't have accepted it even if it had room for it! Which brings up *another* potential fix for this thing: as mentioned, Intel bridges often claim to be "Normal decode", but the core ones seem to almost never actually be that, and they tend to be "Negative decode". So what may be going on is: - the kernel sees that BIOS allocation at 0xc0000000 (I'll take your word for it, it doesn't actually say so without PCI debugging enabled) - ...and notices that the PCI BAR is behind a PCI bridge that does not claim to be able to actually bridge that resource (it's normal decode, and the ranges it *does* decode are elsewhere!) - so clearly that old allocation is pure crap and has to be re-done in a range that is actually properly bridged. but that decision bases itself on the Intel bridge not lying, and if it turns out that the bridge at 0000:00:01.0 actually is transparent, then the original allocation would have been ok. That said, your bridge at 00:1e.0 is *also* transparent, and it's actually against the PCI specs to have two transparent bridges on the same PCI bus, so I'm a bit surprised about that. But it does bring up a new thing you could *try*, namely this patch... (You obviously have to replace "insert-your-device-here" with the proper PCI device ID for the thing you have - your lspci output only gives the name, not the numbers) We seem to have a multitude of possible reasons for this insanity. It would be interesting to hear which one(s) of the possibilities make a difference, if any. Linus --- drivers/pci/quirks.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 26cc4dc..c3b52f5 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -820,6 +820,7 @@ static void __devinit quirk_transparent_bridge(struct pci_dev *dev) { dev->transparent = 1; } +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, insert-your-device-id-here, quirk_transparent_bridge ); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82380FB, quirk_transparent_bridge ); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TOSHIBA, 0x605, quirk_transparent_bridge ); ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 22:31 ` Linus Torvalds @ 2007-12-19 1:38 ` Linus Torvalds 2007-12-20 21:52 ` Richard Henderson 0 siblings, 1 reply; 33+ messages in thread From: Linus Torvalds @ 2007-12-19 1:38 UTC (permalink / raw) To: Richard Henderson Cc: Chuck Ebbert, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas On Tue, 18 Dec 2007, Linus Torvalds wrote: > > That question also brings up another issue: how come did we actually > choose address 0xc0000000 with the original patch you sent in? If we can't > find it in the parent resources, we shouldn't have accepted it even if it > had room for it! That PCI: Cannot allocate resource region 9 of bridge 0000:00:01.0 PCI: Cannot allocate resource region 1 of device 0000:01:00.0 thing is really starting to bug me. I bet that is the real problem here, but it's not printing out enough information about the resource to actually give us much of a clue about what is wrong. I suspect that it had a bridge mapping (device 0:01.0) that included the range from 0xc0000000 to 0xcfffffff, but there was something stupid wrong with it (eg the BIOS had allocated overlapping regions), so we disabled it. That, in turn, then caused us to also refuse the existing 0xc0000000 mapping for the graphics card (device 01:00.0), because now there was no valid resource for it. But that PCI bridge resource handling happens even *before* we look at any PnP reserved areas (because we - for really good reasons - trust the hardware a _lot_ more than we trust any idiotic firmware tables), so I wonder what that strange PCI bridge mapping in 00:01.0 was - it must have been _really_ off in order to not fit in the resource tree. Could you just make it print out what the bridge resources are when it scans them? Something like the appended.. Linus --- diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c index 42ba0e2..37c4b92 100644 --- a/arch/x86/pci/i386.c +++ b/arch/x86/pci/i386.c @@ -117,11 +117,16 @@ static void __init pcibios_allocate_bus_resources(struct list_head *bus_list) /* Depth-First Search on bus tree */ list_for_each_entry(bus, bus_list, node) { if ((dev = bus->self)) { + printk(KERN_DEBUG "PCI: Bridge %s\n", pci_name(dev)); for (idx = PCI_BRIDGE_RESOURCES; idx < PCI_NUM_RESOURCES; idx++) { r = &dev->resource[idx]; if (!r->flags) continue; + printk(KERN_DEBUG "PCI: Bridge resource " + "%08llx-%08llx (f=%lx)\n", + r->start, r->end, r->flags); + pr = pci_find_parent_resource(dev, r); if (!r->start || !pr || request_resource(pr, r) < 0) { ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-19 1:38 ` Linus Torvalds @ 2007-12-20 21:52 ` Richard Henderson 2007-12-20 22:24 ` Linus Torvalds 0 siblings, 1 reply; 33+ messages in thread From: Richard Henderson @ 2007-12-20 21:52 UTC (permalink / raw) To: Linus Torvalds Cc: Chuck Ebbert, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas On Tue, Dec 18, 2007 at 05:38:58PM -0800, Linus Torvalds wrote: > That > > PCI: Cannot allocate resource region 9 of bridge 0000:00:01.0 > PCI: Cannot allocate resource region 1 of device 0000:01:00.0 > > thing is really starting to bug me. > > I bet that is the real problem here, but it's not printing out enough > information about the resource to actually give us much of a clue about > what is wrong. > > I suspect that it had a bridge mapping (device 0:01.0) that included the > range from 0xc0000000 to 0xcfffffff, but there was something stupid wrong > with it (eg the BIOS had allocated overlapping regions), so we disabled > it. That, in turn, then caused us to also refuse the existing 0xc0000000 > mapping for the graphics card (device 01:00.0), because now there was no > valid resource for it. That is exactly it. The relevant section of the debug info is PCI: Bridge 0000:00:01.0 PCI: Bridge resource 7 00008000-00008fff (%f=100) PCI: Bridge resource 8 f7d00000-fddfffff (%f=200) PCI: Bridge resource 9 bdf00000-ddefffff (%f=1201) The bridge was assigned to a piece of the end of physical memory. r~ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-20 21:52 ` Richard Henderson @ 2007-12-20 22:24 ` Linus Torvalds 2007-12-21 0:39 ` Richard Henderson 2007-12-21 2:28 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 33+ messages in thread From: Linus Torvalds @ 2007-12-20 22:24 UTC (permalink / raw) To: Richard Henderson Cc: Chuck Ebbert, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas On Thu, 20 Dec 2007, Richard Henderson wrote: > On Tue, Dec 18, 2007 at 05:38:58PM -0800, Linus Torvalds wrote: > > That > > > > PCI: Cannot allocate resource region 9 of bridge 0000:00:01.0 > > PCI: Cannot allocate resource region 1 of device 0000:01:00.0 > > > > thing is really starting to bug me. > > > > I bet that is the real problem here, but it's not printing out enough > > information about the resource to actually give us much of a clue about > > what is wrong. > > > > I suspect that it had a bridge mapping (device 0:01.0) that included the > > range from 0xc0000000 to 0xcfffffff, but there was something stupid wrong > > with it (eg the BIOS had allocated overlapping regions), so we disabled > > it. That, in turn, then caused us to also refuse the existing 0xc0000000 > > mapping for the graphics card (device 01:00.0), because now there was no > > valid resource for it. > > That is exactly it. The relevant section of the debug info is > > PCI: Bridge 0000:00:01.0 > PCI: Bridge resource 7 00008000-00008fff (%f=100) > PCI: Bridge resource 8 f7d00000-fddfffff (%f=200) > PCI: Bridge resource 9 bdf00000-ddefffff (%f=1201) > > The bridge was assigned to a piece of the end of physical memory. Oh, wow. That's just really bogus. So the kernel message about PCI: Cannot allocate resource region 9 of bridge 0000:00:01.0 was perfectly fine, and we did absolutely the right thing. But it also says that if the graphics adaptor really had a resource mapped at 0xc0000000 - 0xcfffffff by the BIOS, then that mapping never worked at all, since it never had any bridge mapping it could rely on. So our decision to unmap that one as invalid was _also_ right. Damn. Very irritating. You know what? I think this simple (BUT TOTALLY UNTESTED!) patch will get your case right, and I think it is preferable to just always lowering the "minimum" starting point. What it does is to just take the minimum PCI address for new allocations (which is only used for the case where we don't have an explicit starting point for the parent bus anyway!), and just saying "we'll always align it down to the required alignment of the allocation". I'm not exactly 100% happy with it, but it does mean that if we need a big area, we'll relax the suggested starting point by that amount. It's not wonderful, but it essentially admits that the minimum for the allocations is really just a hint, and if we need lots of space for a resource, we'll relax the minimum point appropriately. So in your case, it should *result* in the exact same situation that your patch did, but at the same time, when dealing with the (more common) case of smaller allocations, we still continue to try to avoid being too close to the top-of-memory. So it's not perfect, but perhaps it is a good compromise between being careful and having to make room? Does this work for your case? Linus --- drivers/pci/bus.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c index 9e5ea07..d48d270 100644 --- a/drivers/pci/bus.c +++ b/drivers/pci/bus.c @@ -61,7 +61,7 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res, /* Ok, try it out.. */ ret = allocate_resource(r, res, size, - r->start ? : min, + r->start ? : min & -align, -1, align, alignf, alignf_data); if (ret == 0) ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-20 22:24 ` Linus Torvalds @ 2007-12-21 0:39 ` Richard Henderson 2007-12-21 1:00 ` Linus Torvalds 2007-12-21 2:28 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 33+ messages in thread From: Richard Henderson @ 2007-12-21 0:39 UTC (permalink / raw) To: Linus Torvalds Cc: Chuck Ebbert, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas On Thu, Dec 20, 2007 at 02:24:48PM -0800, Linus Torvalds wrote: > I'm not exactly 100% happy with it, but it does mean that if we need a big > area, we'll relax the suggested starting point by that amount. It's not > wonderful, but it essentially admits that the minimum for the allocations > is really just a hint, and if we need lots of space for a resource, we'll > relax the minimum point appropriately. This breaks in odd cases where the amount of memory in the system is not a nice round number. Like throwing two 128MB sticks into a system that already has 2gb. A 512MB allocation will get placed back at 2gb, on top of the end of ram. In order to get this kind of thing to work, you'd have to have a hard and a soft minimum. Even then, any random large allocation is going to ignore that buffer that you added. It'd be better if we could still tie this ignoring of the buffer to whether the bios placed the resource there in the first place. Perhaps this is one of those things that just aren't going to be solved properly without an xserver upgrade... r~ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-21 0:39 ` Richard Henderson @ 2007-12-21 1:00 ` Linus Torvalds 0 siblings, 0 replies; 33+ messages in thread From: Linus Torvalds @ 2007-12-21 1:00 UTC (permalink / raw) To: Richard Henderson Cc: Chuck Ebbert, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas On Thu, 20 Dec 2007, Richard Henderson wrote: > > This breaks in odd cases where the amount of memory in the system > is not a nice round number. Like throwing two 128MB sticks into > a system that already has 2gb. A 512MB allocation will get placed > back at 2gb, on top of the end of ram. No, no, you misunderstand. The kernel *always* takes known memory allocations into account. The "minimum PCI starting allocation" value is not there to protect memory we know about: the resource management already does that! So if you have real memory of 2GB+128MB, and you want a 512MB allocation, then yes, maybe the "preferred starting point" would be rounded back down to 2GB, but the resource allocator would still take known resources into account, and skip that address as being a conflict, and then try the next address that suits the alignment requirements, and try to see if there's a big enough hole at the 2.5GB mark. So it would all work fine. The reason we have that "min" parameter is not because of those _known_ resources, it's exactly because we have been bitten too many times by BIOSes that lay out magic undocumented resources in memory that we simply don't know about, because they aren't standard BAR resources, but some other special magic stuff. Things like the special ACPI areas that the northbridge recognizes, but aren't exposed as regular BAR's, but as just magic registers hidden in some undocumented NB register space. So the reason we have those PCIBIOS_MIN_IO/MEM things is not because we'd trample on top of memory without them, it's because we might trample the BIOS resources that it never told us about! Quite often, that's things like stolen RAM that doesn't show up in the e820 tables (it *should* show up as "reserved", but BIOS writers are generally incompetent drug-addicts picked up from the streets, who just randomly change BIOS tables until Windows boots on the machine), or the afore-mentioned magic IO registers for some special motherboard resource. > In order to get this kind of thing to work, you'd have to have a hard > and a soft minimum. We do have that "hard limit" - the resource management keeps track of all the resources it already knows about. The "soft limit" is exactly that PCIBIOS_MIN_MEM (which on a PC is that "pci_mem_start" variable). It's just a hint, but it's a pretty important one, exactly because we've been burned so many times by crap firmware and undocumented memory and MMIO ranges. Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-20 22:24 ` Linus Torvalds 2007-12-21 0:39 ` Richard Henderson @ 2007-12-21 2:28 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 33+ messages in thread From: Benjamin Herrenschmidt @ 2007-12-21 2:28 UTC (permalink / raw) To: Linus Torvalds Cc: Richard Henderson, Chuck Ebbert, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas > So in your case, it should *result* in the exact same situation that your > patch did, but at the same time, when dealing with the (more common) case > of smaller allocations, we still continue to try to avoid being too close > to the top-of-memory. > > So it's not perfect, but perhaps it is a good compromise between being > careful and having to make room? > > Does this work for your case? I'm not totally happy with changing the generic code like that, to possibly not enforce "min" anymore. Other archs may have very good reasons to provide a min value here... Though at the same time, at least on powerpc, the parent resource of the host bridge will be the real limit, so that may not be a big issue. Cheers, Ben. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 21:09 ` Linus Torvalds 2007-12-18 21:46 ` Chuck Ebbert 2007-12-18 21:51 ` Richard Henderson @ 2007-12-18 22:16 ` Keith Packard 2007-12-19 0:29 ` Bjorn Helgaas 3 siblings, 0 replies; 33+ messages in thread From: Keith Packard @ 2007-12-18 22:16 UTC (permalink / raw) To: Linus Torvalds Cc: keithp, Richard Henderson, Chuck Ebbert, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Bjorn Helgaas [-- Attachment #1: Type: text/plain, Size: 2122 bytes --] On Tue, 2007-12-18 at 13:09 -0800, Linus Torvalds wrote: > It's not like 256MB is even as large as they come, half-gig graphics cards > are getting to be fairly common at the high end, and X absolutely _has_ to > be able to handle a 64-bit address for those. We're now using a system-dependent wrapper library 'libpciaccess' for all of this stuff, it uses 64-bits for all PCI addresses and should make this transparent to the X driver. In addition, our kernel drivers are moving to support graphics cards that have memory beyond that addressable through their aperture, so we should be able to manage cards with even more memory, some of which is not reachable from the CPU. > Also, I'm surprised it doesn't work with X already: the ChangeLog for X > says that there are "Minor fixes to the handling of 64-bit PCI BARs [..]" > in 4.6.99.18, so I'd have assumed that XFree86-4.7.0 should be able to > handle this perfectly well. And that code has been replaced with an even more general library that abstracts away all of the PCI routing issues. > I'll add Keithp to the cc too, to see if the X issues can be clarified. > Maybe he can set us right. But maybe you just have an old X server? If so, > considering the situation, I really think the kernel has done a good job > already, and I'd be *very* nervous about making the kernel allocate new > PCI resources right after the end-of-memory thing. Trying a libpciaccess-based X server is certainly something worth doing, that should be 1.4 or later (thanks, git-describe). > I bet it would work in this case, but as mentioned, we definitely know of > cases where the BIOS did *not* document the magic memory region that was > stolen for UMA graphics, and trying to put PCI devices just after the top > of reserved memory in the e820 list causes machines to not work at all > because the address decoding will clash. There is an additional single-page BAR on 9xx chips which may end up mapped and not documented. Our kernel driver should correctly deal with that now though. -- keith.packard@intel.com [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 21:09 ` Linus Torvalds ` (2 preceding siblings ...) 2007-12-18 22:16 ` Keith Packard @ 2007-12-19 0:29 ` Bjorn Helgaas 3 siblings, 0 replies; 33+ messages in thread From: Bjorn Helgaas @ 2007-12-19 0:29 UTC (permalink / raw) To: Linus Torvalds Cc: Richard Henderson, Chuck Ebbert, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard On Tuesday 18 December 2007 02:09:15 pm Linus Torvalds wrote: > > On Tue, 18 Dec 2007, Richard Henderson wrote: > > > > I've added dmesg, /proc/iomem, and lspci -v output to that bug. > > > > Basically, we have > > > > c0000000-cfffffff : free > > ddf00000-dfefffff : PCI Bus #04 > > e0000000-efffffff : pnp 00:0b > > f0000000-fedfffff : less than 256MB > > Gaah. > > That really is very unlucky. That 256M only goes at one point in the low > 4GB, but the thing is, it fits perfectly well above it, and dammit, that > resource is explicitly a 64-bit resource or a really good reason. > > However, I wonder about that > > e0000000-efffffff : pnp 00:0b > > thing. I actually suspect that that whole allocation is literally *meant* > for that 256MB graphics aperture, but the kernel explicitly avoids it > because it's listed in the PnP tables. > > I wonder what the heck is the point of that pnp entry. Just for fun, can > you try to just disable CONFIG_PNP, and see if it all works then? 00:0b must be a "motherboard" device, probably PNP0C01 or PNP0C02. Those are catch-all devices with no real programming model associated with them; they only describe resource usage. AFAICT, they're mostly used to describe legacy stuff like interrupt controllers, timers, etc. My laptop has the same range for one of its PNP0C02 devices. I'll try to dig up a chipset spec and see what might look like that range. We used to ignore anything past the first 8 I/O port regions and 4 memory regions (PNP_MAX_PORT and PNP_MAX_MEM), but those limits have been recently bumped a bit [1]. That will cause additional reservations that may explain some of the issues we're seeing. [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a7839e960675b549f06209d18283d5cee2ce9261 > Björn Helgaas added to Cc to clarify what those pnp entries tend to mean, > and whether there is possibly some way to match up a specific pnp entry > with the PCI device that might want to use it. Because that is a nice > 256MB region that really doesn't seem to make sense for anything else than > the graphics buffer - there's nothing else in your system that seems > likely (although I guess it could be for some docking port, but even then > I'd have expected one of the PCI bridges to map it!) > > But apart from the question about that pnp 00:0b device, the kernel > resource allocation really does look perfectly fine, and while we could > shoe-horn it into the low 4GB in this case by just hoping that there is > nothing undocumented there (and there probably isn't), it's really > annoying considering that big graphics areas are a hell of a good reason > to use those 64-bit resources. > > It's not like 256MB is even as large as they come, half-gig graphics cards > are getting to be fairly common at the high end, and X absolutely _has_ to > be able to handle a 64-bit address for those. > > Also, I'm surprised it doesn't work with X already: the ChangeLog for X > says that there are "Minor fixes to the handling of 64-bit PCI BARs [..]" > in 4.6.99.18, so I'd have assumed that XFree86-4.7.0 should be able to > handle this perfectly well. > > I'll add Keithp to the cc too, to see if the X issues can be clarified. > Maybe he can set us right. But maybe you just have an old X server? If so, > considering the situation, I really think the kernel has done a good job > already, and I'd be *very* nervous about making the kernel allocate new > PCI resources right after the end-of-memory thing. > > I bet it would work in this case, but as mentioned, we definitely know of > cases where the BIOS did *not* document the magic memory region that was > stolen for UMA graphics, and trying to put PCI devices just after the top > of reserved memory in the e820 list causes machines to not work at all > because the address decoding will clash. > > Of course, we could also make the minimum address more of a *hint*, and > only make the resource allocator only abut the top-of-known-memory when it > absolutely has to, but on the other hand, in this case it really doesn't > have to, since there's just _tons_ of space for 64-bit resources. So the > correct thing really does seem to be to just use the 64-bit hw that is > there. > > > That would have been an excellent comment to add to that code then, > > rather than just "rounding up to the next 1MB area", because purely > > as rounding code it is erroneous. > > Patches to add comments are welcome. There are few enough people who > actually work on the PCI resource allocation code these days (I wish there > were more), and it's very rare that anybody else than me or Ivan ends up > even *looking* at it. So it's not been a big issue. > > Linus > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 20:22 ` Richard Henderson 2007-12-18 21:09 ` Linus Torvalds @ 2007-12-18 21:23 ` Ivan Kokshaysky 2007-12-18 21:46 ` Linus Torvalds 2007-12-20 21:10 ` Benjamin Herrenschmidt 2 siblings, 1 reply; 33+ messages in thread From: Ivan Kokshaysky @ 2007-12-18 21:23 UTC (permalink / raw) To: Linus Torvalds, Chuck Ebbert, linux-kernel, Daniel Ritz, Greg KH On Tue, Dec 18, 2007 at 12:22:34PM -0800, Richard Henderson wrote: > On Tue, Dec 18, 2007 at 10:21:50AM -0800, Linus Torvalds wrote: > > ... and that would be an X server issue!). > > Of course, fixing the X server to *handle* 64-bit BARs is the correct > solution. I've no idea how involved that is, but I have a sneeking > suspicion that it uses that damned CARD32 datatype for everything. Doh. Let's fix the kernel first... Does this make any difference? (the patch is self explaining ;-) Ivan. --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -208,8 +208,9 @@ pci_setup_bridge(struct pci_bus *bus) } pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE, l); - /* Clear out the upper 32 bits of PREF base. */ - pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, 0); + /* Set up the upper 32 bits of PREF base. */ + l = region.start >> 16 >> 16; + pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, l); pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, bus->bridge_ctl); } ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 21:23 ` Ivan Kokshaysky @ 2007-12-18 21:46 ` Linus Torvalds 2007-12-20 8:46 ` Ivan Kokshaysky 0 siblings, 1 reply; 33+ messages in thread From: Linus Torvalds @ 2007-12-18 21:46 UTC (permalink / raw) To: Ivan Kokshaysky; +Cc: Chuck Ebbert, linux-kernel, Daniel Ritz, Greg KH On Wed, 19 Dec 2007, Ivan Kokshaysky wrote: > > Doh. Let's fix the kernel first... > > Does this make any difference? (the patch is self explaining ;-) Heh, indeed. Good catch - that Prefetchable memory behind bridge: 0000000000000000-000000000fffffff on device 00:01.0 does look totally broken, and it would make more sense if it matched what the device has (and what /proc/iomem resports). That said, Intel bridges tend to be transparent even when they *claim* normal decode, so I wonder whether it actually matters in this case. But your patch looks very obviously right. So maybe the rest of the kernel and X both already did everything right, and it was just this stupid bridge setup thing that was broken (and forcing the IOMEM resource to below the 4G mark just hid it). Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 21:46 ` Linus Torvalds @ 2007-12-20 8:46 ` Ivan Kokshaysky 2007-12-20 21:21 ` Benjamin Herrenschmidt 2007-12-22 9:12 ` Andrew Morton 0 siblings, 2 replies; 33+ messages in thread From: Ivan Kokshaysky @ 2007-12-20 8:46 UTC (permalink / raw) To: Linus Torvalds Cc: Richard Henderson, Chuck Ebbert, linux-kernel, Daniel Ritz, Greg KH On Tue, Dec 18, 2007 at 01:46:56PM -0800, Linus Torvalds wrote: > Heh, indeed. Good catch - that > > Prefetchable memory behind bridge: 0000000000000000-000000000fffffff > > on device 00:01.0 does look totally broken, and it would make more sense > if it matched what the device has (and what /proc/iomem resports). Sigh. The patch was way too incomplete - I somehow missed PCI_PREF_LIMIT_UPPER32... Corrected patch appended - I think it's worth applying in either case since a bridge window at address 0 is an obvious bug and this patch fixes it. > That said, Intel bridges tend to be transparent even when they *claim* > normal decode, so I wonder whether it actually matters in this case. But > your patch looks very obviously right. I don't think that transparency is the case here - I read specs for some recent Intel chipsets and it looks like they are pretty accurate now with a "subtractive decode" flag - over the last couple of years, at least. There are, of course, some strange "priority decode rules", but they can be safely ignored as far as the kernel resource management is concerned. > So maybe the rest of the kernel and X both already did everything right, > and it was just this stupid bridge setup thing that was broken (and > forcing the IOMEM resource to below the 4G mark just hid it). I've just checked the setup-bus code and have to say that its ability to correctly handle the 64-bit BARs is close to zero... On the positive side, getting it right doesn't seem to be as complicated as I thought initially - mainly because only the prefetchable memory window of p2p bridge is 64-bit. This effectively limits >4G allocations to prefetchable resources only. Anyway, using PCI bus space above 4G on x86 seems to be a must these days, and I have some spare hardware to play with. Maybe I'll be able to get something working by mid January or so... Ivan. --- PCI: do respect full 64-bit address for bridge prefetch window Prevent the prefetch window from being programmed with a bogus address when its respective resource gets allocated above the 4G mark. Note that we cannot yet guarantee correct resource allocations above 4G, though it might work in some simple cases. Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> --- drivers/pci/setup-bus.c | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 401e03c..643e72e 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -208,8 +208,11 @@ pci_setup_bridge(struct pci_bus *bus) } pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE, l); - /* Clear out the upper 32 bits of PREF base. */ - pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, 0); + /* Set up the upper 32 bits of PREF base/limit. */ + l = region.start >> 16 >> 16; + pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, l); + l = region.end >> 16 >> 16; + pci_write_config_dword(bridge, PCI_PREF_LIMIT_UPPER32, l); pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, bus->bridge_ctl); } ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-20 8:46 ` Ivan Kokshaysky @ 2007-12-20 21:21 ` Benjamin Herrenschmidt 2007-12-22 9:12 ` Andrew Morton 1 sibling, 0 replies; 33+ messages in thread From: Benjamin Herrenschmidt @ 2007-12-20 21:21 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Linus Torvalds, Richard Henderson, Chuck Ebbert, linux-kernel, Daniel Ritz, Greg KH Another turd is pci_scan_device() which can't cope with 64 bits BARs on 32 bits platforms even when they have 64 bits resources. I'll send a fix for that. Ben. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-20 8:46 ` Ivan Kokshaysky 2007-12-20 21:21 ` Benjamin Herrenschmidt @ 2007-12-22 9:12 ` Andrew Morton 2007-12-22 9:20 ` Andrew Morton 1 sibling, 1 reply; 33+ messages in thread From: Andrew Morton @ 2007-12-22 9:12 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Linus Torvalds, Richard Henderson, Chuck Ebbert, linux-kernel, Daniel Ritz, Greg KH On Thu, 20 Dec 2007 11:46:16 +0300 Ivan Kokshaysky <ink@jurassic.park.msu.ru> wrote: > PCI: do respect full 64-bit address for bridge prefetch window > > Prevent the prefetch window from being programmed with a bogus address > when its respective resource gets allocated above the 4G mark. > > Note that we cannot yet guarantee correct resource allocations > above 4G, though it might work in some simple cases. > So.. did we agree that this patch is good to go? > --- a/drivers/pci/setup-bus.c > +++ b/drivers/pci/setup-bus.c > @@ -208,8 +208,11 @@ pci_setup_bridge(struct pci_bus *bus) > } > pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE, l); > > - /* Clear out the upper 32 bits of PREF base. */ > - pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, 0); > + /* Set up the upper 32 bits of PREF base/limit. */ > + l = region.start >> 16 >> 16; We have the little upper_32_bits() helper for this. > + pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, l); > + l = region.end >> 16 >> 16; > + pci_write_config_dword(bridge, PCI_PREF_LIMIT_UPPER32, l); > > pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, bus->bridge_ctl); > } ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-22 9:12 ` Andrew Morton @ 2007-12-22 9:20 ` Andrew Morton 0 siblings, 0 replies; 33+ messages in thread From: Andrew Morton @ 2007-12-22 9:20 UTC (permalink / raw) To: Ivan Kokshaysky, Linus Torvalds, Richard Henderson, Chuck Ebbert, linux-kernel, Daniel Ritz, Greg KH On Sat, 22 Dec 2007 01:12:18 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 20 Dec 2007 11:46:16 +0300 Ivan Kokshaysky <ink@jurassic.park.msu.ru> wrote: > > > PCI: do respect full 64-bit address for bridge prefetch window > > > > Prevent the prefetch window from being programmed with a bogus address > > when its respective resource gets allocated above the 4G mark. > > > > Note that we cannot yet guarantee correct resource allocations > > above 4G, though it might work in some simple cases. > > > > So.. did we agree that this patch is good to go? Oh, I see Greg merged a differnet patch. > > --- a/drivers/pci/setup-bus.c > > +++ b/drivers/pci/setup-bus.c > > @@ -208,8 +208,11 @@ pci_setup_bridge(struct pci_bus *bus) > > } > > pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE, l); > > > > - /* Clear out the upper 32 bits of PREF base. */ > > - pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, 0); > > + /* Set up the upper 32 bits of PREF base/limit. */ > > + l = region.start >> 16 >> 16; > > We have the little upper_32_bits() helper for this. > Which could use this. --- a/drivers/pci/setup-bus.c~gregkh-pci-pci-fix-bus-resource-assignment-on-32-bits-with-64b-resources-cleanup +++ a/drivers/pci/setup-bus.c @@ -206,10 +206,8 @@ pci_setup_bridge(struct pci_bus *bus) if (bus->resource[2]->flags & IORESOURCE_PREFETCH) { l = (region.start >> 16) & 0xfff0; l |= region.end & 0xfff00000; -#ifdef CONFIG_RESOURCES_64BIT - bu = region.start >> 32; - lu = region.end >> 32; -#endif + bu = upper_32_bits(region.start); + lu = upper_32_bits(region.end); DBG(KERN_INFO " PREFETCH window: 0x%016llx-0x%016llx\n", (unsigned long long)region.start, (unsigned long long)region.end); ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 20:22 ` Richard Henderson 2007-12-18 21:09 ` Linus Torvalds 2007-12-18 21:23 ` Ivan Kokshaysky @ 2007-12-20 21:10 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 33+ messages in thread From: Benjamin Herrenschmidt @ 2007-12-20 21:10 UTC (permalink / raw) To: Richard Henderson Cc: Linus Torvalds, Chuck Ebbert, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH > That won't work, because PCI_BASE_ADDRESS_MEM_TYPE_64 controls how > many bits need to be written back to the BAR. If we changed that > to PCI_BASE_ADDRESS_MEM_TYPE_32, we wouldn't clear the high 32-bits > of the BAR. > > > ... and that would be an X server issue!). > > Of course, fixing the X server to *handle* 64-bit BARs is the correct > solution. I've no idea how involved that is, but I have a sneeking > suspicion that it uses that damned CARD32 datatype for everything. A lot more than X needs to be fixed to handle 64-bit BARs btw. There's a whole load of places in drivers/pci/* where we just puke if we see a value >4G being assigned. Now, there is some hope that the new X with libpciaccess can cope with that, and even if it is broken, it would be much easier to fix, as X in that case is no longer trying to bypass the kernel, but instead uses proper kernel interfaces to map device resources. That used to be Xorg pci-rework branch, though it might have been merged in the trunk by now. Ben. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-18 0:25 PCI resource problems caused by improper address rounding Chuck Ebbert 2007-12-18 0:57 ` Linus Torvalds @ 2007-12-22 9:22 ` Andrew Morton 1 sibling, 0 replies; 33+ messages in thread From: Andrew Morton @ 2007-12-22 9:22 UTC (permalink / raw) To: Chuck Ebbert Cc: linux-kernel, Ivan Kokshaysky, Linus Torvalds, Daniel Ritz, Greg KH, Richard Henderson, Ingo Molnar On Mon, 17 Dec 2007 19:25:27 -0500 Chuck Ebbert <cebbert@redhat.com> wrote: > Looks like a commit that I can't find in git due to the arch merge > has broken PCI address assignment. This patch by Richard Henderson > against 2.6.23 fixes it for x86_64: > > --- linux-2.6.23.x86_64/arch/x86_64/kernel/e820.c 2007-10-09 13:31:38.000000000 -0700 > +++ linux-2.6.23.x86_64-rth/arch/x86_64/kernel/e820.c 2007-12-15 12:37:44.000000000 -0800 > @@ -718,8 +718,8 @@ __init void e820_setup_gap(void) > while ((gapsize >> 4) > round) > round += round; > /* Fun with two's complement */ > - pci_mem_start = (gapstart + round) & -round; > + pci_mem_start = (gapstart + round - 1) & -round; > > printk(KERN_INFO "Allocating PCI resources starting at %lx (gap: %lx:%lx)\n", > pci_mem_start, gapstart, gapsize); > > > Here is the original changeset, taken from the Mercurial repo. It was > merged in 2.6.14: > > # HG changeset patch > # User Daniel Ritz <daniel.ritz@gmx.ch> > # Date 1126304746 -700 > # Node ID 51367d6e0b839be0b425a8f67c29f625b670f126 > # Parent f4852c862b04efc9f8e2c7913191f5f7d140d895 > [PATCH] Update PCI IOMEM allocation start > > This fixes the problem with "Averatec 6240 pcmcia_socket0: unable to > apply power", which was due to the CardBus IOMEM register region being > allocated at an address that was actually inside the RAM window that had > been reserved for video frame-buffers in an UMA setup. > > The BIOS _should_ have marked that region reserved in the e820 memory > descriptor tables, but did not. > > It is fixed by rounding up the default starting address of PCI memory > allocations, so that we leave a bigger gap after the final known memory > location. The amount of rounding depends on how big the unused memory > gap is that we can allocate IOMEM from. > > Based on example code by Linus. > > Acked-by: Greg KH <greg@kroah.com> > Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > committer: Linus Torvalds <torvalds@g5.osdl.org> 1126304746 -0700 > > > --- a/arch/i386/kernel/setup.c Fri Sep 09 22:28:40 2005 +0011 > +++ b/arch/i386/kernel/setup.c Fri Sep 09 22:37:26 2005 +0011 > @@ -1300,7 +1300,7 @@ legacy_init_iomem_resources(struct resou > */ > static void __init register_memory(void) > { > - unsigned long gapstart, gapsize; > + unsigned long gapstart, gapsize, round; > unsigned long long last; > int i; > > @@ -1345,14 +1345,14 @@ static void __init register_memory(void) > } > > /* > - * Start allocating dynamic PCI memory a bit into the gap, > - * aligned up to the nearest megabyte. > - * > - * Question: should we try to pad it up a bit (do something > - * like " + (gapsize >> 3)" in there too?). We now have the > - * technology. > + * See how much we want to round up: start off with > + * rounding to the next 1MB area. > */ > - pci_mem_start = (gapstart + 0xfffff) & ~0xfffff; > + round = 0x100000; > + while ((gapsize >> 4) > round) > + round += round; > + /* Fun with two's complement */ > + pci_mem_start = (gapstart + round) & -round; > > printk("Allocating PCI resources starting at %08lx (gap: %08lx:%08lx)\n", > pci_mem_start, gapstart, gapsize); > --- a/arch/x86_64/kernel/e820.c Fri Sep 09 22:28:40 2005 +0011 > +++ b/arch/x86_64/kernel/e820.c Fri Sep 09 22:37:26 2005 +0011 > @@ -567,7 +567,7 @@ unsigned long pci_mem_start = 0xaeedbabe > */ > __init void e820_setup_gap(void) > { > - unsigned long gapstart, gapsize; > + unsigned long gapstart, gapsize, round; > unsigned long last; > int i; > int found = 0; > @@ -604,14 +604,14 @@ __init void e820_setup_gap(void) > } > > /* > - * Start allocating dynamic PCI memory a bit into the gap, > - * aligned up to the nearest megabyte. > - * > - * Question: should we try to pad it up a bit (do something > - * like " + (gapsize >> 3)" in there too?). We now have the > - * technology. > + * See how much we want to round up: start off with > + * rounding to the next 1MB area. > */ > - pci_mem_start = (gapstart + 0xfffff) & ~0xfffff; > + round = 0x100000; > + while ((gapsize >> 4) > round) > + round += round; > + /* Fun with two's complement */ > + pci_mem_start = (gapstart + round) & -round; > > printk(KERN_INFO "Allocating PCI resources starting at %lx (gap: %lx:%lx)\n", > pci_mem_start, gapstart, gapsize); > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <fa.WmGIH8th8MfmciABVSBi6whxeFE@ifi.uio.no>]
[parent not found: <fa.Obg5E3fyax+MaF94//uo40q/Zyk@ifi.uio.no>]
[parent not found: <fa./6K5nXEIpws4VU8HtJhQjF4AoGg@ifi.uio.no>]
[parent not found: <fa.V82IIxMkW3eu+9B44NfoyYYQDP4@ifi.uio.no>]
[parent not found: <fa.DM9AyNQQtam66XpKVeXeqS639os@ifi.uio.no>]
[parent not found: <fa.f5O3U527Rv8DNk05hDFRjdCaeFE@ifi.uio.no>]
* Re: PCI resource problems caused by improper address rounding [not found] ` <fa.f5O3U527Rv8DNk05hDFRjdCaeFE@ifi.uio.no> @ 2007-12-19 0:11 ` Robert Hancock 2007-12-19 0:55 ` Chuck Ebbert [not found] ` <fa.PJGSMm4TIW6lRYng/jDqooIvj8U@ifi.uio.no> 1 sibling, 1 reply; 33+ messages in thread From: Robert Hancock @ 2007-12-19 0:11 UTC (permalink / raw) To: Linus Torvalds Cc: Richard Henderson, Chuck Ebbert, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas Linus Torvalds wrote: > > On Tue, 18 Dec 2007, Richard Henderson wrote: >> I've added dmesg, /proc/iomem, and lspci -v output to that bug. >> >> Basically, we have >> >> c0000000-cfffffff : free >> ddf00000-dfefffff : PCI Bus #04 >> e0000000-efffffff : pnp 00:0b >> f0000000-fedfffff : less than 256MB > > Gaah. > > That really is very unlucky. That 256M only goes at one point in the low > 4GB, but the thing is, it fits perfectly well above it, and dammit, that > resource is explicitly a 64-bit resource or a really good reason. > > However, I wonder about that > > e0000000-efffffff : pnp 00:0b > > thing. I actually suspect that that whole allocation is literally *meant* > for that 256MB graphics aperture, but the kernel explicitly avoids it > because it's listed in the PnP tables. That is probably the MMCONFIG aperture, in that case any attempt to map the graphics BAR there will have disastrous results. (This BIOS has an MCFG table, though it looks like this Fedora kernel has MMCONFIG disabled, so we can't tell what it actually contains.) -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-19 0:11 ` Robert Hancock @ 2007-12-19 0:55 ` Chuck Ebbert 2007-12-19 1:12 ` Richard Henderson 0 siblings, 1 reply; 33+ messages in thread From: Chuck Ebbert @ 2007-12-19 0:55 UTC (permalink / raw) To: Robert Hancock Cc: Linus Torvalds, Richard Henderson, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas On 12/18/2007 07:11 PM, Robert Hancock wrote: >> However, I wonder about that >> >> e0000000-efffffff : pnp 00:0b >> >> thing. I actually suspect that that whole allocation is literally >> *meant* for that 256MB graphics aperture, but the kernel explicitly >> avoids it because it's listed in the PnP tables. > > That is probably the MMCONFIG aperture, in that case any attempt to map > the graphics BAR there will have disastrous results. (This BIOS has an > MCFG table, though it looks like this Fedora kernel has MMCONFIG > disabled, so we can't tell what it actually contains.) > You can boot with "pci=mmconf" to enable it. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-19 0:55 ` Chuck Ebbert @ 2007-12-19 1:12 ` Richard Henderson 2007-12-19 3:12 ` Linus Torvalds 0 siblings, 1 reply; 33+ messages in thread From: Richard Henderson @ 2007-12-19 1:12 UTC (permalink / raw) To: Chuck Ebbert Cc: Robert Hancock, Linus Torvalds, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas On Tue, Dec 18, 2007 at 07:55:37PM -0500, Chuck Ebbert wrote: > You can boot with "pci=mmconf" to enable it. Heh. PCI: BIOS Bug: MCFG area at e0000000 is not E820-reserved PCI: Not using MMCONFIG. r~ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding 2007-12-19 1:12 ` Richard Henderson @ 2007-12-19 3:12 ` Linus Torvalds 0 siblings, 0 replies; 33+ messages in thread From: Linus Torvalds @ 2007-12-19 3:12 UTC (permalink / raw) To: Richard Henderson Cc: Chuck Ebbert, Robert Hancock, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas On Tue, 18 Dec 2007, Richard Henderson wrote: > > Heh. > > PCI: BIOS Bug: MCFG area at e0000000 is not E820-reserved > PCI: Not using MMCONFIG. Well, that at least confirms that e0000000 is indeed the mmconfig area. One of these days we'll trust the ACPI resource data enough that we can use mmconfig even when it's just reserved in those PnP things, which is apparently how BIOS writers are suggested to do it (stupidly enough, but whatever) Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <fa.PJGSMm4TIW6lRYng/jDqooIvj8U@ifi.uio.no>]
[parent not found: <fa.0UHHdYi5zqyJ2xOPhNk/BhJkxYM@ifi.uio.no>]
* Re: PCI resource problems caused by improper address rounding [not found] ` <fa.0UHHdYi5zqyJ2xOPhNk/BhJkxYM@ifi.uio.no> @ 2007-12-19 0:18 ` Robert Hancock 0 siblings, 0 replies; 33+ messages in thread From: Robert Hancock @ 2007-12-19 0:18 UTC (permalink / raw) To: Linus Torvalds Cc: Chuck Ebbert, Richard Henderson, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH, Keith Packard, Bjorn Helgaas Linus Torvalds wrote: > > On Tue, 18 Dec 2007, Chuck Ebbert wrote: > >> On 12/18/2007 04:09 PM, Linus Torvalds wrote: >>> I wonder what the heck is the point of that pnp entry. Just for fun, can >>> you try to just disable CONFIG_PNP, and see if it all works then? >> pnpacpi=off should work. >> >> PnP is also trying (and failing) to reserve all physical memory. > > Yeah, that really is a pretty confused-looking pnp table thing. But I have > absolutely zero idea how PnP is even supposed to work - the whole thing is > just a total hack for Windows, afaik. > > The sad part is that *normally* the right thing to do about almost any > BIOS information is what we do right now: just avoid that magic address > range like the plague, because we have no clue what the heck the BIOS is > up to. But it looks like in this particular case, some of the problems > may arise exactly *because* we avoid that range. > > It would be good to know what Windows does. If ACPI is found, does it > perhaps just ignore all the PnP entries these days? > > Linus ACPI is where those PnP entries are coming from (on any modern system anyway). They do show up in Device Manager as devices with resources (the one that reserves all of system RAM on my machine is labeled "System board", others like the one that reserves the MMCONFIG aperature are "Motherboard resources" - the name is based on the PNP device ID, I believe). It could be that Windows is stupid enough that it will map things over top of physical RAM if the BIOS doesn't explicitly reserve it like that. I suspect based on some comments in Microsoft documents that Windows uses the E820 table to figure out where the RAM is, and ACPI/PnP information to figure out where IO mappings are, but may not really combine those two pieces of information into one overall map like Linux does, which would explain why it needs to reserve all physical RAM.. (As mentioned in another post, I would guess the BIOS is reserving that memory range since it's the MMCONFIG aperture..) -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: PCI resource problems caused by improper address rounding [not found] ` <fa.Obg5E3fyax+MaF94//uo40q/Zyk@ifi.uio.no> [not found] ` <fa./6K5nXEIpws4VU8HtJhQjF4AoGg@ifi.uio.no> @ 2007-12-19 0:38 ` Robert Hancock 1 sibling, 0 replies; 33+ messages in thread From: Robert Hancock @ 2007-12-19 0:38 UTC (permalink / raw) To: Linus Torvalds Cc: Chuck Ebbert, linux-kernel, Ivan Kokshaysky, Daniel Ritz, Greg KH Linus Torvalds wrote: > > On Mon, 17 Dec 2007, Chuck Ebbert wrote: >> Looks like a commit that I can't find in git due to the arch merge >> has broken PCI address assignment. This patch by Richard Henderson >> against 2.6.23 fixes it for x86_64: >> >> --- linux-2.6.23.x86_64/arch/x86_64/kernel/e820.c 2007-10-09 13:31:38.000000000 -0700 >> +++ linux-2.6.23.x86_64-rth/arch/x86_64/kernel/e820.c 2007-12-15 12:37:44.000000000 -0800 >> @@ -718,8 +718,8 @@ __init void e820_setup_gap(void) >> while ((gapsize >> 4) > round) >> round += round; >> /* Fun with two's complement */ >> - pci_mem_start = (gapstart + round) & -round; >> + pci_mem_start = (gapstart + round - 1) & -round; > > No, it's very much meant to be that way. > > We do *not* want to have the PCI memory abutthe end of memory exactly. So > it leaves a gap in between "gapstart" and the actual start of PCI memory > addressing very much on purpose. > > In fact, the very commit (it's f0eca9626c6becb6fc56106b2e4287c6c784af3d in > the kernel tree) you mention actually explicitly *explains* that, although > maybe it's a bit indirect: if you start allocating PCI resources directly > after the end-of-RAM thing, you can easily end up using addresses that are > actually inside the magic stolen system RAM that is being used for UMA > video etc. > > So you very much want to have a buffer in between the end-of-RAM and the > actual start of the region we try to allocate in. > > So why do you want them to be close, anyway? > > Linus > > PS. On a different topic: if you do > > git log --follow arch/x86/kernel/e820_64.c > > you'd see the history past the renames in git. Or just do a "git blame -C" > which will also follow renames (and copies). That patch is from the 2.6.14 era - I don't think we even did PnP ACPI resource reservation handling then? It could be that the BIOS was trying to tell us that UMA memory region is reserved using PnP ACPI reservations, but we just ignored it. It seems rather arbitrary in how much it leaves unused - and in this case, likely prevents us from using the nice big open gap that the BIOS presumably expected the graphics card to be mapped into. I suspect this buffer space insertion is really not needed at this point. The patch description is likely technically correct in that the BIOS should have reserved it in E820, but (according to MS comments in a presentation I read) Windows doesn't use E820 for anything other than figuring out where RAM is, it uses PnP ACPI for figuring out areas it needs to avoid. Since BIOS writers test against that behavior, there are surely lots of systems where ignoring PnP ACPI reservations and relying only on E820 would result in things really going blammo (like mappings things over MMCONFIG tables for instance). So disabling it on modern machines is really not an option. And if it's enabled, you likely wouldn't hit the problem it tries to fix. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2007-12-22 9:24 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-18 0:25 PCI resource problems caused by improper address rounding Chuck Ebbert
2007-12-18 0:57 ` Linus Torvalds
2007-12-18 17:34 ` Chuck Ebbert
2007-12-18 18:21 ` Linus Torvalds
2007-12-18 20:22 ` Richard Henderson
2007-12-18 21:09 ` Linus Torvalds
2007-12-18 21:46 ` Chuck Ebbert
2007-12-18 21:56 ` Linus Torvalds
2007-12-18 22:17 ` Richard Henderson
2007-12-18 21:51 ` Richard Henderson
2007-12-18 22:31 ` Linus Torvalds
2007-12-19 1:38 ` Linus Torvalds
2007-12-20 21:52 ` Richard Henderson
2007-12-20 22:24 ` Linus Torvalds
2007-12-21 0:39 ` Richard Henderson
2007-12-21 1:00 ` Linus Torvalds
2007-12-21 2:28 ` Benjamin Herrenschmidt
2007-12-18 22:16 ` Keith Packard
2007-12-19 0:29 ` Bjorn Helgaas
2007-12-18 21:23 ` Ivan Kokshaysky
2007-12-18 21:46 ` Linus Torvalds
2007-12-20 8:46 ` Ivan Kokshaysky
2007-12-20 21:21 ` Benjamin Herrenschmidt
2007-12-22 9:12 ` Andrew Morton
2007-12-22 9:20 ` Andrew Morton
2007-12-20 21:10 ` Benjamin Herrenschmidt
2007-12-22 9:22 ` Andrew Morton
[not found] <fa.WmGIH8th8MfmciABVSBi6whxeFE@ifi.uio.no>
[not found] ` <fa.Obg5E3fyax+MaF94//uo40q/Zyk@ifi.uio.no>
[not found] ` <fa./6K5nXEIpws4VU8HtJhQjF4AoGg@ifi.uio.no>
[not found] ` <fa.V82IIxMkW3eu+9B44NfoyYYQDP4@ifi.uio.no>
[not found] ` <fa.DM9AyNQQtam66XpKVeXeqS639os@ifi.uio.no>
[not found] ` <fa.f5O3U527Rv8DNk05hDFRjdCaeFE@ifi.uio.no>
2007-12-19 0:11 ` Robert Hancock
2007-12-19 0:55 ` Chuck Ebbert
2007-12-19 1:12 ` Richard Henderson
2007-12-19 3:12 ` Linus Torvalds
[not found] ` <fa.PJGSMm4TIW6lRYng/jDqooIvj8U@ifi.uio.no>
[not found] ` <fa.0UHHdYi5zqyJ2xOPhNk/BhJkxYM@ifi.uio.no>
2007-12-19 0:18 ` Robert Hancock
2007-12-19 0:38 ` Robert Hancock
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox