public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
* acpi based pci gap calculation  - v3
@ 2008-07-15 18:59 Alok Kataria
  2008-07-15 20:28 ` Jesse Barnes
  2008-07-16  5:42 ` Andi Kleen
  0 siblings, 2 replies; 17+ messages in thread
From: Alok Kataria @ 2008-07-15 18:59 UTC (permalink / raw)
  To: ak, Ingo Molnar, Brown, Len; +Cc: LKML, linux-acpi


Hi Andi,

I don't see this patch being applied in any of the tree yet.
Resending this incase you guys might have missed it. 
I assume this will go through the ACPI tree. 

The need for this patch was explained in 
http://marc.info/?l=linux-kernel&m=121441818818598&w=2

The v2 patch was discussed in 
http://marc.info/?l=linux-acpi&m=121433339619175&w=2

I then sent a incremental patch on top of v2 to fix some conditions and
handle the change in the e820_search_gap interface. 

In this v3 patch, i have folded the incremental fix into the v2 patch.
Please apply this.

Thanks,
Alok

--
acpi based pci gap calculation - v3

From: Alok N Kataria <akataria@vmware.com>

Evaluates the _CRS object under PCI0 looking for producer resources.
Then searches the e820 memory space for a gap within these producer resources.

Allows us to find a gap for the unclaimed pci resources or MMIO resources
for hotplug devices within the BIOS allowed pci regions.

v1->v2: Some changes in the print strings.
    	Also note the prototype for e820_search_gap has changed in this
    	iteration, but the return value is ignored while calling it over here.
v2->v3: Pass the pci producer resources end_addr (got from _CRS) to the
	e820_search_gap function.
	We limit the search only within the producer region's address space.
	Also add a condition to check if the resource lies in the 32bit
	address range.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
---

 arch/x86/pci/acpi.c |   80 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 80 insertions(+), 0 deletions(-)


diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index 19af069..fff2c42 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -4,6 +4,7 @@
 #include <linux/irq.h>
 #include <linux/dmi.h>
 #include <asm/numa.h>
+#include <asm/e820.h>
 #include "pci.h"
 
 struct pci_root_info {
@@ -14,6 +15,11 @@ struct pci_root_info {
 	int busnum;
 };
 
+struct gap_info {
+	unsigned long gapstart;
+	unsigned long gapsize;
+};
+
 static acpi_status
 resource_to_addr(struct acpi_resource *resource,
 			struct acpi_resource_address64 *addr)
@@ -110,6 +116,78 @@ adjust_transparent_bridge_resources(struct pci_bus *bus)
 	}
 }
 
+static acpi_status search_gap(struct acpi_resource *resource, void *data)
+{
+	struct acpi_resource_address64 addr;
+	acpi_status status;
+	struct gap_info *gap = data;
+	unsigned long long start_addr, end_addr;
+
+	status = resource_to_addr(resource, &addr);
+	if (ACPI_SUCCESS(status) &&
+	    addr.resource_type == ACPI_MEMORY_RANGE &&
+	    addr.address_length > gap->gapsize) {
+		start_addr = addr.minimum + addr.translation_offset;
+		/*
+		 * We want space only in the 32bit address range
+		 */
+		if (start_addr < UINT_MAX) {
+			end_addr = start_addr + addr.address_length;
+			e820_search_gap(&gap->gapstart, &gap->gapsize,
+					start_addr, end_addr);
+		}
+	}
+
+	return AE_OK;
+}
+
+/*
+ * Search for a hole in the 32 bit address space for PCI to assign MMIO
+ * resources, for hotplug or unconfigured resources.
+ * We query the CRS object of the PCI root device to look for possible producer
+ * resources in the tree and consider these while calulating the start address
+ * for this hole.
+ */
+static void pci_setup_gap(acpi_handle *handle)
+{
+	struct gap_info gap;
+	acpi_status status;
+
+	gap.gapstart = 0;
+	gap.gapsize = 0x400000;
+
+	status = acpi_walk_resources(handle, METHOD_NAME__CRS,
+				     search_gap, &gap);
+
+	if (ACPI_SUCCESS(status)) {
+		unsigned long round;
+
+		if (!gap.gapstart) {
+			printk(KERN_ERR "ACPI: Warning: Cannot find a gap "
+					"in the 32bit address range for PCI\n"
+					"ACPI: PCI devices may collide with "
+					"hotpluggable memory address range\n");
+		}
+		/*
+		 * Round the gapstart, uses the same logic as in
+		 * e820_gap_setup
+		 */
+		round = 0x100000;
+		while ((gap.gapsize >> 4) > round)
+			round += round;
+		/* Fun with two's complement */
+		pci_mem_start = (gap.gapstart + round) & -round;
+
+		printk(KERN_INFO "ACPI: PCI resources should "
+				"start at %lx (gap: %lx:%lx)\n",
+				pci_mem_start, gap.gapstart, gap.gapsize);
+	} else {
+		printk(KERN_ERR "ACPI: Error while searching for gap in "
+				"the 32bit address range for PCI\n");
+	}
+}
+
+
 static void
 get_current_resources(struct acpi_device *device, int busnum,
 			int domain, struct pci_bus *bus)
@@ -220,6 +298,8 @@ struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_device *device, int do
 
 	if (bus && (pci_probe & PCI_USE__CRS))
 		get_current_resources(device, busnum, domain, bus);
+
+	pci_setup_gap(device->handle);
 	return bus;
 }
 



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: acpi based pci gap calculation  - v3
  2008-07-15 18:59 acpi based pci gap calculation - v3 Alok Kataria
@ 2008-07-15 20:28 ` Jesse Barnes
  2008-07-15 20:54   ` Alok Kataria
  2008-07-16  5:42 ` Andi Kleen
  1 sibling, 1 reply; 17+ messages in thread
From: Jesse Barnes @ 2008-07-15 20:28 UTC (permalink / raw)
  To: akataria; +Cc: ak, Ingo Molnar, Brown, Len, LKML, linux-acpi, linux-pci, TJ

On Tuesday, July 15, 2008 11:59 am Alok Kataria wrote:
> Hi Andi,
>
> I don't see this patch being applied in any of the tree yet.
> Resending this incase you guys might have missed it.
> I assume this will go through the ACPI tree.
>
> The need for this patch was explained in
> http://marc.info/?l=linux-kernel&m=121441818818598&w=2
>
> The v2 patch was discussed in
> http://marc.info/?l=linux-acpi&m=121433339619175&w=2
>
> I then sent a incremental patch on top of v2 to fix some conditions and
> handle the change in the e820_search_gap interface.
>
> In this v3 patch, i have folded the incremental fix into the v2 patch.
> Please apply this.

This should probably go to linux-pci too.

I wonder if this is stable enough to go into 2.6.27?  Most of the PCI bugs we 
have at the moment are related to PCI resource allocation failures.  My hope 
is that finding more space will fix most of them.  Assuming this patch 
doesn't have any dependencies, I can put it in my linux-next branch.

Also, TJ was working on something similar; any comments TJ?

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: acpi based pci gap calculation  - v3
  2008-07-15 20:28 ` Jesse Barnes
@ 2008-07-15 20:54   ` Alok Kataria
  2008-07-15 22:53     ` Jesse Barnes
  0 siblings, 1 reply; 17+ messages in thread
From: Alok Kataria @ 2008-07-15 20:54 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: ak@linux.intel.com, Ingo Molnar, Brown, Len, LKML, linux-acpi,
	linux-pci@vger.kernel.org, TJ

On Tue, 2008-07-15 at 13:28 -0700, Jesse Barnes wrote:
> On Tuesday, July 15, 2008 11:59 am Alok Kataria wrote:
> > Hi Andi,
> >
> > I don't see this patch being applied in any of the tree yet.
> > Resending this incase you guys might have missed it.
> > I assume this will go through the ACPI tree.
> >
> > The need for this patch was explained in
> > http://marc.info/?l=linux-kernel&m=121441818818598&w=2
> >
> > The v2 patch was discussed in
> > http://marc.info/?l=linux-acpi&m=121433339619175&w=2
> >
> > I then sent a incremental patch on top of v2 to fix some conditions and
> > handle the change in the e820_search_gap interface.
> >
> > In this v3 patch, i have folded the incremental fix into the v2 patch.
> > Please apply this.
> 
> This should probably go to linux-pci too.
> 
> I wonder if this is stable enough to go into 2.6.27?  

I have tested it with couple of different BIOS settings and it seems to
work as it should.

> Most of the PCI bugs we
> have at the moment are related to PCI resource allocation failures.  My hope
> is that finding more space will fix most of them.  Assuming this patch
> doesn't have any dependencies, I can put it in my linux-next branch.

No dependencies, I had added a function e820_search_gap which is used by
this patch. This function is already in the mainline tree. 
Thanks for applying.

--
Alok


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: acpi based pci gap calculation  - v3
  2008-07-15 20:54   ` Alok Kataria
@ 2008-07-15 22:53     ` Jesse Barnes
  2008-07-15 23:07       ` Ingo Molnar
  2008-07-16  5:40       ` Andi Kleen
  0 siblings, 2 replies; 17+ messages in thread
From: Jesse Barnes @ 2008-07-15 22:53 UTC (permalink / raw)
  To: akataria
  Cc: ak@linux.intel.com, Ingo Molnar, Brown, Len, LKML, linux-acpi,
	linux-pci@vger.kernel.org, TJ

On Tuesday, July 15, 2008 1:54 pm Alok Kataria wrote:
> I have tested it with couple of different BIOS settings and it seems to
> work as it should.
>
> > Most of the PCI bugs we
> > have at the moment are related to PCI resource allocation failures.  My
> > hope is that finding more space will fix most of them.  Assuming this
> > patch doesn't have any dependencies, I can put it in my linux-next
> > branch.
>
> No dependencies, I had added a function e820_search_gap which is used by
> this patch. This function is already in the mainline tree.
> Thanks for applying.

Ok, I stuffed it in my linux-next branch.  I'll let it sit there for a day or 
so though, just to shake out any build problems etc. in linux-next before 
asking Linus to pull the whole lot.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: acpi based pci gap calculation  - v3
  2008-07-15 22:53     ` Jesse Barnes
@ 2008-07-15 23:07       ` Ingo Molnar
  2008-07-16  5:40       ` Andi Kleen
  1 sibling, 0 replies; 17+ messages in thread
From: Ingo Molnar @ 2008-07-15 23:07 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: akataria, ak@linux.intel.com, Brown, Len, LKML, linux-acpi,
	linux-pci@vger.kernel.org, TJ


* Jesse Barnes <jbarnes@virtuousgeek.org> wrote:

> On Tuesday, July 15, 2008 1:54 pm Alok Kataria wrote:
> > I have tested it with couple of different BIOS settings and it seems to
> > work as it should.
> >
> > > Most of the PCI bugs we
> > > have at the moment are related to PCI resource allocation failures.  My
> > > hope is that finding more space will fix most of them.  Assuming this
> > > patch doesn't have any dependencies, I can put it in my linux-next
> > > branch.
> >
> > No dependencies, I had added a function e820_search_gap which is 
> > used by this patch. This function is already in the mainline tree. 
> > Thanks for applying.
> 
> Ok, I stuffed it in my linux-next branch.  I'll let it sit there for a 
> day or so though, just to shake out any build problems etc. in 
> linux-next before asking Linus to pull the whole lot.

thanks. The general x86 infrastructure bits this patch depends on are 
upstream already.

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: acpi based pci gap calculation  - v3
  2008-07-15 22:53     ` Jesse Barnes
  2008-07-15 23:07       ` Ingo Molnar
@ 2008-07-16  5:40       ` Andi Kleen
  2008-07-16 16:06         ` Jesse Barnes
  1 sibling, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2008-07-16  5:40 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: akataria, Ingo Molnar, Brown, Len, LKML, linux-acpi,
	linux-pci@vger.kernel.org, TJ

Jesse Barnes wrote:
> On Tuesday, July 15, 2008 1:54 pm Alok Kataria wrote:
>> I have tested it with couple of different BIOS settings and it seems to
>> work as it should.
>>
>>> Most of the PCI bugs we
>>> have at the moment are related to PCI resource allocation failures.  My
>>> hope is that finding more space will fix most of them.  Assuming this
>>> patch doesn't have any dependencies, I can put it in my linux-next
>>> branch.
>> No dependencies, I had added a function e820_search_gap which is used by
>> this patch. This function is already in the mainline tree.
>> Thanks for applying.
> 
> Ok, I stuffed it in my linux-next branch.  I'll let it sit there for a day or 
> so though, just to shake out any build problems etc. in linux-next before 
> asking Linus to pull the whole lot.

For me it seems a little risky to push up that early. I think it would 
be better to let it sit longer in linux-next for this. After all this is 
a major change how IO resources are allocated. That is why I didn't pull
it into the ACPI release branch.

-Andi


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: acpi based pci gap calculation  - v3
  2008-07-15 18:59 acpi based pci gap calculation - v3 Alok Kataria
  2008-07-15 20:28 ` Jesse Barnes
@ 2008-07-16  5:42 ` Andi Kleen
  1 sibling, 0 replies; 17+ messages in thread
From: Andi Kleen @ 2008-07-16  5:42 UTC (permalink / raw)
  To: akataria; +Cc: Ingo Molnar, Brown, Len, LKML, linux-acpi

Alok Kataria wrote:
> Hi Andi,
> 
> I don't see this patch being applied in any of the tree yet.
> Resending this incase you guys might have missed it. 

Sorry still some backlog from vacation last week. However the focus
right now is on patches for 2.6.27 and I don't think your patch
is really tested well enough yet for that. It is fairly intrusive
and changes a critical piece of code. More likely it's a .28 thing,
or possibly a late merge candidate (but that's also not clear)

-Andi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: acpi based pci gap calculation  - v3
  2008-07-16  5:40       ` Andi Kleen
@ 2008-07-16 16:06         ` Jesse Barnes
  2008-07-16 16:33           ` Andi Kleen
  0 siblings, 1 reply; 17+ messages in thread
From: Jesse Barnes @ 2008-07-16 16:06 UTC (permalink / raw)
  To: Andi Kleen
  Cc: akataria, Ingo Molnar, Brown, Len, LKML, linux-acpi,
	linux-pci@vger.kernel.org, TJ

On Tuesday, July 15, 2008 10:40 pm Andi Kleen wrote:
> Jesse Barnes wrote:
> > On Tuesday, July 15, 2008 1:54 pm Alok Kataria wrote:
> >> I have tested it with couple of different BIOS settings and it seems to
> >> work as it should.
> >>
> >>> Most of the PCI bugs we
> >>> have at the moment are related to PCI resource allocation failures.  My
> >>> hope is that finding more space will fix most of them.  Assuming this
> >>> patch doesn't have any dependencies, I can put it in my linux-next
> >>> branch.
> >>
> >> No dependencies, I had added a function e820_search_gap which is used by
> >> this patch. This function is already in the mainline tree.
> >> Thanks for applying.
> >
> > Ok, I stuffed it in my linux-next branch.  I'll let it sit there for a
> > day or so though, just to shake out any build problems etc. in linux-next
> > before asking Linus to pull the whole lot.
>
> For me it seems a little risky to push up that early. I think it would
> be better to let it sit longer in linux-next for this. After all this is
> a major change how IO resources are allocated. That is why I didn't pull
> it into the ACPI release branch.

The only problem there is that linux-next doesn't get nearly the sort of 
testing coverage we need for this kind of change.  It's also small, and 
reverting it is easy, so if we run into big problems that we can't fix we can 
always back it out...

I'm interested in hearing people's thoughts on this.

Jesse

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: acpi based pci gap calculation  - v3
  2008-07-16 16:06         ` Jesse Barnes
@ 2008-07-16 16:33           ` Andi Kleen
  2008-07-17  0:03             ` Jesse Barnes
  0 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2008-07-16 16:33 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: akataria, Ingo Molnar, Brown, Len, LKML, linux-acpi,
	linux-pci@vger.kernel.org, TJ


> The only problem there is that linux-next doesn't get nearly the sort of 
> testing coverage we need for this kind of change.

Normally I tend to wait for one -mm release, which seems to be tested
by a reasonable number of people. If it survives that it is good
to be tested in Linus' tree.

Just stuffing this in in literally the last minute doesn't seem
like a good idea.

-Andi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: acpi based pci gap calculation  - v3
  2008-07-16 16:33           ` Andi Kleen
@ 2008-07-17  0:03             ` Jesse Barnes
  2008-07-17 21:31               ` Alok Kataria
  0 siblings, 1 reply; 17+ messages in thread
From: Jesse Barnes @ 2008-07-17  0:03 UTC (permalink / raw)
  To: Andi Kleen
  Cc: akataria, Ingo Molnar, Brown, Len, LKML, linux-acpi,
	linux-pci@vger.kernel.org, TJ

On Wednesday, July 16, 2008 9:33 am Andi Kleen wrote:
> > The only problem there is that linux-next doesn't get nearly the sort of
> > testing coverage we need for this kind of change.
>
> Normally I tend to wait for one -mm release, which seems to be tested
> by a reasonable number of people. If it survives that it is good
> to be tested in Linus' tree.
>
> Just stuffing this in in literally the last minute doesn't seem
> like a good idea.

Well it's hardly last minute given that the merge window only opened a couple 
of days ago...

But beyond that, now that I've thought about it a bit more I'm not even sure 
the patch is really correct (though it works on my test machines).  Shouldn't 
we be looking at _PRS not _CRS?  And ideally we should try to find even more 
space, not less.  This patch made one of my machines lose quite a bit of 
space:

...
Allocating PCI resources starting at c0000000 (gap: bf000000:40f00000)
...
ACPI: PCI resources should start at c0000000 (gap: bf000000:31000000)
...

which is a step backwards.  With that in mind, I reverted the patch before 
asking Linus to pull; I'm hopeful we can do better though.  I'd love to never 
see "resource allocation failed" messages anymore.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: acpi based pci gap calculation  - v3
  2008-07-17  0:03             ` Jesse Barnes
@ 2008-07-17 21:31               ` Alok Kataria
       [not found]                 ` <1216663147.26169.5.camel@alok-dev1>
  0 siblings, 1 reply; 17+ messages in thread
From: Alok Kataria @ 2008-07-17 21:31 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Andi Kleen, Ingo Molnar, Brown, Len, LKML, linux-acpi,
	linux-pci@vger.kernel.org, TJ

Hi Jesse,

On Wed, 2008-07-16 at 17:03 -0700, Jesse Barnes wrote:
> On Wednesday, July 16, 2008 9:33 am Andi Kleen wrote:
> > > The only problem there is that linux-next doesn't get nearly the sort of
> > > testing coverage we need for this kind of change.
> >
> > Normally I tend to wait for one -mm release, which seems to be tested
> > by a reasonable number of people. If it survives that it is good
> > to be tested in Linus' tree.
> >
> > Just stuffing this in in literally the last minute doesn't seem
> > like a good idea.
> 
> Well it's hardly last minute given that the merge window only opened a couple
> of days ago...
> 
> But beyond that, now that I've thought about it a bit more I'm not even sure
> the patch is really correct (though it works on my test machines).  Shouldn't
> we be looking at _PRS not _CRS? 

IMO, the current resource allocations will give us a better idea of
which region is free.
Besides, from what i read PRS is optional and not all BIOS's may export
that.

>  And ideally we should try to find even more
> space, not less.  This patch made one of my machines lose quite a bit of
> space:
> 
> ...
> Allocating PCI resources starting at c0000000 (gap: bf000000:40f00000)
> ...
> ACPI: PCI resources should start at c0000000 (gap: bf000000:31000000)
> ...
> 

Yep, we should try to find more space but we should also make sure that
this space doesn't clash with any other resource.
As explained in my first mail while posting v1 of these patches, the
kernel ignores the memory hotplug range while calculating this gap for
PCI, and consequently these regions collide. 
With this patch we have put some constraints like looking only in the
producer regions rather than all regions which are right now not
reserved/used by the e820 map.

However, looking back again i think we can improve this further in some
cases.
I have made some more changes in the e820_search_gap algorithm to find
the *biggest* un-utilized gap in the 32bit address range, and have added
some debug print messages.
Can you please try this patch on your system and mail me your full
dmesg.

Thanks,
Alok

---

 arch/x86/kernel/e820.c |   19 +++++++++++++++++++
 arch/x86/pci/acpi.c    |    4 ++++
 2 files changed, 23 insertions(+), 0 deletions(-)


diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index a5383ae..298aeec 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -539,6 +539,8 @@ __init int e820_search_gap(unsigned long *gapstart, unsigned long *gapsize,
 
 	last = (end_addr && end_addr < MAX_GAP_END) ? end_addr : MAX_GAP_END;
 
+	printk("E820_DEBUG: Searching for gap between (0x%08lx - 0x%08llx)\n",
+							 start_addr, end_addr);
 	while (--i >= 0) {
 		unsigned long long start = e820.map[i].addr;
 		unsigned long long end = start + e820.map[i].size;
@@ -556,9 +558,26 @@ __init int e820_search_gap(unsigned long *gapstart, unsigned long *gapsize,
 			if (gap >= *gapsize) {
 				*gapsize = gap;
 				*gapstart = end;
+				printk("E820_DEBUG: Found gap starting at "
+					"0x%08llx size 0x%08llx\n", end, gap);
 				found = 1;
 			}
 		}
+		if (start > start_addr) {
+			unsigned long gap, prev_end;
+			prev_end = e820.map[i-1].addr + e820.map[i-1].size;
+			if (start_addr > prev_end) {
+				gap = start - start_addr;
+				if (gap >=*gapsize) {
+					*gapsize = gap;
+                                *gapstart = start_addr;
+                                printk("E820_DEBUG: Found gap at start starting at "
+                                        "0x%08llx size 0x%08llx\n", end, gap);
+                                found = 1;
+				start = start_addr;
+                        	}
+			}
+		}
 		if (start < last)
 			last = start;
 	}
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index fff2c42..1a88149 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -131,8 +131,12 @@ static acpi_status search_gap(struct acpi_resource *resource, void *data)
 		/*
 		 * We want space only in the 32bit address range
 		 */
+		printk("ACPI_DEBUG start_addr 0x%08lx gapsize 0x%08lx "
+			"address_length 0x%08lx\n", start_addr, gap->gapsize,
+			addr.address_length);
 		if (start_addr < UINT_MAX) {
 			end_addr = start_addr + addr.address_length;
+			printk("\t\tend_addr is 0x%08lx\n", end_addr);
 			e820_search_gap(&gap->gapstart, &gap->gapsize,
 					start_addr, end_addr);
 		}



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: acpi based pci gap calculation  - v3
       [not found]                   ` <200807221450.52114.jbarnes@virtuousgeek.org>
@ 2008-07-22 22:52                     ` Alok Kataria
  2008-07-23  7:13                       ` TJ
  2008-07-23 16:58                       ` Jesse Barnes
  0 siblings, 2 replies; 17+ messages in thread
From: Alok Kataria @ 2008-07-22 22:52 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Andi Kleen, Ingo Molnar, Brown, Len, LKML, linux-acpi,
	linux-pci@vger.kernel.org, TJ

On Tue, 2008-07-22 at 14:50 -0700, Jesse Barnes wrote:
> On Monday, July 21, 2008 10:59 am Alok Kataria wrote:
> > Hi Jesse,
> >
> > Did you get a chance to try this patch on your box. Let me know what are
> > the values you get now.
> 
> Here's the dmesg from my box with the ACPI gap stuff applied.
> 
> I'm still more inclined to TJ's approach though; it should give us a lot more
> space for PCI devices; though you're right that avoiding conflicts is
> definitely important too...

Hi Jesse,

Thanks for sending the log.

In the log that you sent me, please note the following debug messages
-------
E820_DEBUG: Searching for gap between (0x00000000 - 0x100000000)
E820_DEBUG: Found gap starting at 0xbf000000 size 0x40f00000
Allocating PCI resources starting at c0000000 (gap: bf000000:40f00000)
-------

This is the gap that was allocated by walking just the e820_map

With my changes we query the _CRS resource and get following info
------
ACPI_DEBUG start_addr 0xf8000000 gapsize 0x00400000 address_length 0x06b00000
                end_addr is 0xfeb00000
E820_DEBUG: Searching for gap between (0xf8000000 - 0xfeb00000)
E820_DEBUG: Found gap at start starting at 0x100000000 size 0x07f00000
ACPI_DEBUG start_addr 0xbf000000 gapsize 0x07f00000 address_length 0x31000000
                end_addr is 0xf0000000
E820_DEBUG: Searching for gap between (0xbf000000 - 0xf0000000)
E820_DEBUG: Found gap starting at 0xbf000000 size 0x31000000
------

So there are 2 producer regions one from [0xBF000000 - 0xF0000000] and
another from [0xF8000000 - 0xFEB00000]. That means BIOS has reserved the
area from [0xF0000000 - 0xF7FFFFFF] for some other resource.
If you look a little below in the log there is this

----
system 00:01: iomem range 0xf0000000-0xf7ffffff has been reserved
----

So the gap that we had calculated first i.e. from e820_setup_gap did
contain a collision i.e. though a resource was reserved from 
[0xf0000000 - 0xf7ffffff] our gap calculation doesn't take that into
account.  My patch fixes this issue.

So, IMHO this is a BUG and should be fixed. Please let me know your
views. 

Thanks,
Alok


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: acpi based pci gap calculation  - v3
  2008-07-22 22:52                     ` Alok Kataria
@ 2008-07-23  7:13                       ` TJ
  2008-07-23 16:58                       ` Jesse Barnes
  1 sibling, 0 replies; 17+ messages in thread
From: TJ @ 2008-07-23  7:13 UTC (permalink / raw)
  To: linux-pci@vger.kernel.org; +Cc: linux-acpi

Alok's observations about the ACPI PCI0._CRS() package ResourceProducer
AddressRangeMemory definitions is the same approach as my optional
PCI-DRAM approach. Possibly where we differ is in how much weight is
given to the DSDT values since I found whilst analysing several hundred
DSDTs over the past year or so that the values returned could often be
'iffy' or over-vealous.

For example, there's a bug in the DSDT of many Sony Vaios whereby the
Intel 945 PCI Root Bridge's TOLUD (Top Of Low User Dynamic ram) register
is defined as a 6-bit wide field rather than the actual 5 bits, and is
then used in _CRS() by the AML to calculate and set the min, max, and
length of a range that represents installed RAM.

I've also seen instances of large areas being reserved 'just in case'
which might be all very well for servers with realistic hot-plug
opportunities but definitely cramps the style for notebooks and most
desktops.

I've decided to publish my code so at least others evaluate it and get
some experience of real-world situations. It provides command-line
over-rides for all options so it is easy to try different policies, and
via sysfs it can be changed on-the-fly. I'm not going to propose it as a
patch-set for submission yet, though.

It'll be next week before I get chance to publish - so much to do, too
much fun to be had!

TJ.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: acpi based pci gap calculation  - v3
  2008-07-22 22:52                     ` Alok Kataria
  2008-07-23  7:13                       ` TJ
@ 2008-07-23 16:58                       ` Jesse Barnes
  2008-07-23 17:10                         ` Alok Kataria
  1 sibling, 1 reply; 17+ messages in thread
From: Jesse Barnes @ 2008-07-23 16:58 UTC (permalink / raw)
  To: akataria
  Cc: Andi Kleen, Ingo Molnar, Brown, Len, LKML, linux-acpi,
	linux-pci@vger.kernel.org, TJ

On Tuesday, July 22, 2008 3:52 pm Alok Kataria wrote:
> In the log that you sent me, please note the following debug messages
> -------
> E820_DEBUG: Searching for gap between (0x00000000 - 0x100000000)
> E820_DEBUG: Found gap starting at 0xbf000000 size 0x40f00000
> Allocating PCI resources starting at c0000000 (gap: bf000000:40f00000)
> -------
>
> This is the gap that was allocated by walking just the e820_map
>
> With my changes we query the _CRS resource and get following info
> ------
> ACPI_DEBUG start_addr 0xf8000000 gapsize 0x00400000 address_length
> 0x06b00000 end_addr is 0xfeb00000
> E820_DEBUG: Searching for gap between (0xf8000000 - 0xfeb00000)
> E820_DEBUG: Found gap at start starting at 0x100000000 size 0x07f00000
> ACPI_DEBUG start_addr 0xbf000000 gapsize 0x07f00000 address_length
> 0x31000000 end_addr is 0xf0000000
> E820_DEBUG: Searching for gap between (0xbf000000 - 0xf0000000)
> E820_DEBUG: Found gap starting at 0xbf000000 size 0x31000000
> ------
>
> So there are 2 producer regions one from [0xBF000000 - 0xF0000000] and
> another from [0xF8000000 - 0xFEB00000]. That means BIOS has reserved the
> area from [0xF0000000 - 0xF7FFFFFF] for some other resource.
> If you look a little below in the log there is this
>
> ----
> system 00:01: iomem range 0xf0000000-0xf7ffffff has been reserved
> ----
>
> So the gap that we had calculated first i.e. from e820_setup_gap did
> contain a collision i.e. though a resource was reserved from
> [0xf0000000 - 0xf7ffffff] our gap calculation doesn't take that into
> account.  My patch fixes this issue.
>
> So, IMHO this is a BUG and should be fixed. Please let me know your
> views.

Yes, there's a conflict there, but on many machines it's probably a harmless 
one.  My main concerns are these:
  1) it changes long standing behavior and doesn't fix any real reported bugs
     I'm aware of (feel free to point me at some)
  2) it looks like it will dramatically reduce the available PCI resource
     space on at least some platforms, and that space is already scarce in our
     current scheme

So basically, I'm just feeling very conservative about any changes to resource 
allocation, that's all.

If this change were coupled with one that allowed us to exploit more available 
address space for PCI resources I'd feel much more comfortable about it, 
which is why I'm so interested in TJ's work (/me goes off to read his wiki 
now).

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: acpi based pci gap calculation  - v3
  2008-07-23 16:58                       ` Jesse Barnes
@ 2008-07-23 17:10                         ` Alok Kataria
  2008-07-23 17:52                           ` Jesse Barnes
  0 siblings, 1 reply; 17+ messages in thread
From: Alok Kataria @ 2008-07-23 17:10 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Andi Kleen, Ingo Molnar, Len, LKML, linux-acpi,
	linux-pci@vger.kernel.org, TJ

Hi Jesse,

On Wed, 2008-07-23 at 09:58 -0700, Jesse Barnes wrote:
> On Tuesday, July 22, 2008 3:52 pm Alok Kataria wrote:
> > So the gap that we had calculated first i.e. from e820_setup_gap did
> > contain a collision i.e. though a resource was reserved from
> > [0xf0000000 - 0xf7ffffff] our gap calculation doesn't take that into
> > account.  My patch fixes this issue.
> >
> > So, IMHO this is a BUG and should be fixed. Please let me know your
> > views.
> 
> Yes, there's a conflict there, but on many machines it's probably a harmless
> one.  My main concerns are these:
>   1) it changes long standing behavior and doesn't fix any real reported bugs
>      I'm aware of (feel free to point me at some)

The problem that I see is with Memory Hotadd, though my BIOS exports the
correct SRAT table and tells the OS that some regions are hot pluggable,
PCI gap calculation ignores that info and assigns some "option ROMS" in
hot pluggable memory region. 

Because of this when i try to hot add memory, the OS sees a resource
allocation conflict and bails out. Which shouldn't have happened,
right ?

>   2) it looks like it will dramatically reduce the available PCI resource
>      space on at least some platforms, and that space is already scarce in our
>      current scheme

IMO, these gaps are used only for the option ROMS or hot pluggable
devices which in itself are rare. So its not like we are reducing the
whole available PCI resource space, but only the space that is needed by
these optional ROMS.
And i think its inevitable if we have to remove these conflicts with any
other subsystem. 
> 
> So basically, I'm just feeling very conservative about any changes to resource
> allocation, that's all.
> 
> If this change were coupled with one that allowed us to exploit more available
> address space for PCI resources I'd feel much more comfortable about it,
> which is why I'm so interested in TJ's work (/me goes off to read his wiki
> now).

/me to having a look.

Thanks,
Alok
> 
> Thanks,
> Jesse


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: acpi based pci gap calculation  - v3
  2008-07-23 17:10                         ` Alok Kataria
@ 2008-07-23 17:52                           ` Jesse Barnes
  2008-07-23 18:02                             ` Alok Kataria
  0 siblings, 1 reply; 17+ messages in thread
From: Jesse Barnes @ 2008-07-23 17:52 UTC (permalink / raw)
  To: akataria
  Cc: Andi Kleen, Ingo Molnar, Len, LKML, linux-acpi,
	linux-pci@vger.kernel.org, TJ

On Wednesday, July 23, 2008 10:10 am Alok Kataria wrote:
> Hi Jesse,
>
> On Wed, 2008-07-23 at 09:58 -0700, Jesse Barnes wrote:
> > On Tuesday, July 22, 2008 3:52 pm Alok Kataria wrote:
> > > So the gap that we had calculated first i.e. from e820_setup_gap did
> > > contain a collision i.e. though a resource was reserved from
> > > [0xf0000000 - 0xf7ffffff] our gap calculation doesn't take that into
> > > account.  My patch fixes this issue.
> > >
> > > So, IMHO this is a BUG and should be fixed. Please let me know your
> > > views.
> >
> > Yes, there's a conflict there, but on many machines it's probably a
> > harmless one.  My main concerns are these:
> >   1) it changes long standing behavior and doesn't fix any real reported
> > bugs I'm aware of (feel free to point me at some)
>
> The problem that I see is with Memory Hotadd, though my BIOS exports the
> correct SRAT table and tells the OS that some regions are hot pluggable,
> PCI gap calculation ignores that info and assigns some "option ROMS" in
> hot pluggable memory region.
>
> Because of this when i try to hot add memory, the OS sees a resource
> allocation conflict and bails out. Which shouldn't have happened,
> right ?

Ah ok, so you've got a real problem here.  And I don't deny that there's a 
conflict that we should fix.

> >   2) it looks like it will dramatically reduce the available PCI resource
> >      space on at least some platforms, and that space is already scarce
> > in our current scheme
>
> IMO, these gaps are used only for the option ROMS or hot pluggable
> devices which in itself are rare. So its not like we are reducing the
> whole available PCI resource space, but only the space that is needed by
> these optional ROMS.
> And i think its inevitable if we have to remove these conflicts with any
> other subsystem.

Actually we have other allocations.  In many cases the BIOS won't assign MMIO 
resources, so we do it at boot time.  Most of the open PCI bugs we have today 
are resource allocation failures, and not just for ROMs, so hopefully you can 
understand my worry.  Using all available gaps rather than just the largest 
seems like the obvious first step to increasing our available space, and 
would give us more leeway to avoid stomping on other reserved space.

Jesse

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: acpi based pci gap calculation  - v3
  2008-07-23 17:52                           ` Jesse Barnes
@ 2008-07-23 18:02                             ` Alok Kataria
  0 siblings, 0 replies; 17+ messages in thread
From: Alok Kataria @ 2008-07-23 18:02 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Andi Kleen, Ingo Molnar, Len, LKML, linux-acpi,
	linux-pci@vger.kernel.org, TJ

On Wed, 2008-07-23 at 10:52 -0700, Jesse Barnes wrote:
> On Wednesday, July 23, 2008 10:10 am Alok Kataria wrote:
> > Hi Jesse,
> >
> > On Wed, 2008-07-23 at 09:58 -0700, Jesse Barnes wrote:
> > > On Tuesday, July 22, 2008 3:52 pm Alok Kataria wrote:
> > > > So the gap that we had calculated first i.e. from e820_setup_gap did
> > > > contain a collision i.e. though a resource was reserved from
> > > > [0xf0000000 - 0xf7ffffff] our gap calculation doesn't take that into
> > > > account.  My patch fixes this issue.
> > > >
> > > > So, IMHO this is a BUG and should be fixed. Please let me know your
> > > > views.
> > >
> > > Yes, there's a conflict there, but on many machines it's probably a
> > > harmless one.  My main concerns are these:
> > >   1) it changes long standing behavior and doesn't fix any real reported
> > > bugs I'm aware of (feel free to point me at some)
> >
> > The problem that I see is with Memory Hotadd, though my BIOS exports the
> > correct SRAT table and tells the OS that some regions are hot pluggable,
> > PCI gap calculation ignores that info and assigns some "option ROMS" in
> > hot pluggable memory region.
> >
> > Because of this when i try to hot add memory, the OS sees a resource
> > allocation conflict and bails out. Which shouldn't have happened,
> > right ?
> 
> Ah ok, so you've got a real problem here.  And I don't deny that there's a
> conflict that we should fix.

Thanks :-)

> 
> > >   2) it looks like it will dramatically reduce the available PCI resource
> > >      space on at least some platforms, and that space is already scarce
> > > in our current scheme
> >
> > IMO, these gaps are used only for the option ROMS or hot pluggable
> > devices which in itself are rare. So its not like we are reducing the
> > whole available PCI resource space, but only the space that is needed by
> > these optional ROMS.
> > And i think its inevitable if we have to remove these conflicts with any
> > other subsystem.
> 
> Actually we have other allocations.  In many cases the BIOS won't assign MMIO
> resources, so we do it at boot time.  Most of the open PCI bugs we have today
> are resource allocation failures, and not just for ROMs, so hopefully you can
> understand my worry.  Using all available gaps rather than just the largest
> seems like the obvious first step to increasing our available space, and
> would give us more leeway to avoid stomping on other reserved space.
> 

Yeah i agree that this should be the approach. 
I went through TJ's wiki and in the solutions section the first point
talks about 
"keeping a list of all available PCI iomem regions which will be derived
from the e820 map (and the CRS object of the root PCI device). 

TJ/Jesse, do we have any patches/work-in-progress which actually does
this ?
I can then add the CRS object stuff to that then. 

Thanks,
Alok



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-07-23 18:02 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-15 18:59 acpi based pci gap calculation - v3 Alok Kataria
2008-07-15 20:28 ` Jesse Barnes
2008-07-15 20:54   ` Alok Kataria
2008-07-15 22:53     ` Jesse Barnes
2008-07-15 23:07       ` Ingo Molnar
2008-07-16  5:40       ` Andi Kleen
2008-07-16 16:06         ` Jesse Barnes
2008-07-16 16:33           ` Andi Kleen
2008-07-17  0:03             ` Jesse Barnes
2008-07-17 21:31               ` Alok Kataria
     [not found]                 ` <1216663147.26169.5.camel@alok-dev1>
     [not found]                   ` <200807221450.52114.jbarnes@virtuousgeek.org>
2008-07-22 22:52                     ` Alok Kataria
2008-07-23  7:13                       ` TJ
2008-07-23 16:58                       ` Jesse Barnes
2008-07-23 17:10                         ` Alok Kataria
2008-07-23 17:52                           ` Jesse Barnes
2008-07-23 18:02                             ` Alok Kataria
2008-07-16  5:42 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox