linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] x86/PCI: allocate space from the end of a region, not the beginning
       [not found] <20100915210818.12365.58732.stgit@bob.kio>
@ 2010-09-15 21:09 ` Bjorn Helgaas
  2010-09-15 21:09 ` [PATCH 2/3] resources: allocate space within a region from the top down Bjorn Helgaas
  2010-09-15 21:09 ` [PATCH 3/3] PCI: allocate bus resources " Bjorn Helgaas
  2 siblings, 0 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2010-09-15 21:09 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Brian Bloniarz, Charles Butterfield, Denys Vlasenko, linux-pci,
	linux-kernel, Stefan Becker, H. Peter Anvin, Yinghai Lu,
	Thomas Gleixner, Linus Torvalds, Ingo Molnar


Allocate from the end of a region, not the beginning.

For example, if we need to allocate 0x800 bytes for a device on bus
0000:00 given these resources:

    [mem 0xbff00000-0xdfffffff] PCI Bus 0000:00
      [mem 0xc0000000-0xdfffffff] PCI Bus 0000:02

the available space at [mem 0xbff00000-0xbfffffff] is passed to the
alignment callback (pcibios_align_resource()).  Prior to this patch, we
would put the new 0x800 byte resource at the beginning of that available
space, i.e., at [mem 0xbff00000-0xbff007ff].

With this patch, we put it at the end, at [mem 0xbffff800-0xbfffffff].

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c41
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
---

 arch/x86/pci/i386.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)


diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 5525309..1ff3e9f 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -65,7 +65,10 @@ pcibios_align_resource(void *data, const struct resource *res,
 			resource_size_t size, resource_size_t align)
 {
 	struct pci_dev *dev = data;
-	resource_size_t start = res->start;
+	resource_size_t start = ALIGN(res->end - size + 1, align);
+
+	if (start < res->start)
+		start = res->start;
 
 	if (res->flags & IORESOURCE_IO) {
 		if (skip_isa_ioresource_align(dev))


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] resources: allocate space within a region from the top down
       [not found] <20100915210818.12365.58732.stgit@bob.kio>
  2010-09-15 21:09 ` [PATCH 1/3] x86/PCI: allocate space from the end of a region, not the beginning Bjorn Helgaas
@ 2010-09-15 21:09 ` Bjorn Helgaas
  2010-09-15 21:09 ` [PATCH 3/3] PCI: allocate bus resources " Bjorn Helgaas
  2 siblings, 0 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2010-09-15 21:09 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Brian Bloniarz, Charles Butterfield, Denys Vlasenko, linux-pci,
	linux-kernel, Stefan Becker, H. Peter Anvin, Yinghai Lu,
	Thomas Gleixner, Linus Torvalds, Ingo Molnar


Allocate space from the top of a region first, then work downward.

When we allocate space from a resource, we look for gaps between children
of the resource.  Previously, we looked at gaps from the bottom up.  For
example, given this:

    [mem 0xbff00000-0xf7ffffff] PCI Bus 0000:00
      [mem 0xc0000000-0xdfffffff] PCI Bus 0000:02

we attempted to allocate from the [mem 0xbff00000-0xbfffffff] gap first,
then the [mem 0xe0000000-0xf7ffffff] gap.

With this patch, we allocate from [mem 0xe0000000-0xf7ffffff] first.

Low addresses are generally scarce, so it's better to use high addresses
when possible.  This follows Windows practice for PCI allocation.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c42
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
---

 kernel/resource.c |   40 ++++++++++++++++++++++++----------------
 1 files changed, 24 insertions(+), 16 deletions(-)


diff --git a/kernel/resource.c b/kernel/resource.c
index 7b36976..e83ff7c 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -358,6 +358,20 @@ int __weak page_is_ram(unsigned long pfn)
 }
 
 /*
+ * Find the resource before "child" in the sibling list of "root" children.
+ */
+static struct resource *find_sibling_prev(struct resource *root, struct resource *child)
+{
+	struct resource *this;
+
+	for (this = root->child; this; this = this->sibling)
+		if (this->sibling == child)
+			return this;
+
+	return NULL;
+}
+
+/*
  * Find empty slot in the resource tree given range and alignment.
  */
 static int find_resource(struct resource *root, struct resource *new,
@@ -369,23 +383,17 @@ static int find_resource(struct resource *root, struct resource *new,
 						   resource_size_t),
 			 void *alignf_data)
 {
-	struct resource *this = root->child;
+	struct resource *this;
 	struct resource tmp = *new;
 
-	tmp.start = root->start;
-	/*
-	 * Skip past an allocated resource that starts at 0, since the assignment
-	 * of this->start - 1 to tmp->end below would cause an underflow.
-	 */
-	if (this && this->start == 0) {
-		tmp.start = this->end + 1;
-		this = this->sibling;
-	}
-	for(;;) {
+	tmp.end = root->end;
+
+	this = find_sibling_prev(root, NULL);
+	for (;;) {
 		if (this)
-			tmp.end = this->start - 1;
+			tmp.start = this->end + 1;
 		else
-			tmp.end = root->end;
+			tmp.start = root->start;
 		if (tmp.start < min)
 			tmp.start = min;
 		if (tmp.end > max)
@@ -398,10 +406,10 @@ static int find_resource(struct resource *root, struct resource *new,
 			new->end = tmp.start + size - 1;
 			return 0;
 		}
-		if (!this)
+		if (!this || this->start == root->start)
 			break;
-		tmp.start = this->end + 1;
-		this = this->sibling;
+		tmp.end = this->start - 1;
+		this = find_sibling_prev(root, this);
 	}
 	return -EBUSY;
 }


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] PCI: allocate bus resources from the top down
       [not found] <20100915210818.12365.58732.stgit@bob.kio>
  2010-09-15 21:09 ` [PATCH 1/3] x86/PCI: allocate space from the end of a region, not the beginning Bjorn Helgaas
  2010-09-15 21:09 ` [PATCH 2/3] resources: allocate space within a region from the top down Bjorn Helgaas
@ 2010-09-15 21:09 ` Bjorn Helgaas
  2010-09-15 21:50   ` H. Peter Anvin
  2 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2010-09-15 21:09 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Brian Bloniarz, Charles Butterfield, Denys Vlasenko, linux-pci,
	linux-kernel, Stefan Becker, H. Peter Anvin, Yinghai Lu,
	Thomas Gleixner, Linus Torvalds, Ingo Molnar


Allocate space from the highest-address PCI bus resource first, then work
downward.

Previously, we looked for space in PCI host bridge windows in the order
we discovered the windows.  For example, given the following windows
(discovered via an ACPI _CRS method):

    pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
    pci_root PNP0A03:00: host bridge window [mem 0x000c0000-0x000effff]
    pci_root PNP0A03:00: host bridge window [mem 0x000f0000-0x000fffff]
    pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xf7ffffff]
    pci_root PNP0A03:00: host bridge window [mem 0xff980000-0xff980fff]
    pci_root PNP0A03:00: host bridge window [mem 0xff97c000-0xff97ffff]
    pci_root PNP0A03:00: host bridge window [mem 0xfed20000-0xfed9ffff]

we attempted to allocate from [mem 0x000a0000-0x000bffff] first, then
[mem 0x000c0000-0x000effff], and so on.

With this patch, we allocate from [mem 0xff980000-0xff980fff] first, then
[mem 0xff97c000-0xff97ffff], [mem 0xfed20000-0xfed9ffff], etc.

Allocating top-down follows Windows practice, so we're less likely to
trip over BIOS defects in the _CRS description.

On the machine above (a Dell T3500), the [mem 0xbff00000-0xbfffffff] region
doesn't actually work and is likely a BIOS defect.  The symptom is that we
move the AHCI controller to 0xbff00000, which leads to "Boot has failed,
sleeping forever," a BUG in ahci_stop_engine(), or some other boot failure.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c43
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=620313
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=629933
Reported-by: Brian Bloniarz <phunge0@hotmail.com>
Reported-and-tested-by: Stefan Becker <chemobejk@gmail.com>
Reported-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
---

 drivers/pci/bus.c |   53 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 48 insertions(+), 5 deletions(-)


diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 7f0af0e..172bf26 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -64,6 +64,49 @@ void pci_bus_remove_resources(struct pci_bus *bus)
 	}
 }
 
+/*
+ * Find the highest-address bus resource below the cursor "res".  If the
+ * cursor is NULL, return the highest resource.
+ */
+static struct resource *pci_bus_find_resource_prev(struct pci_bus *bus,
+						   unsigned int type,
+						   struct resource *res)
+{
+	struct resource *r, *prev = NULL;
+	int i;
+
+	pci_bus_for_each_resource(bus, r, i) {
+		if (!r)
+			continue;
+
+		if ((r->flags & IORESOURCE_TYPE_BITS) != type)
+			continue;
+
+		/* If this resource is at or past the cursor, skip it */
+		if (res) {
+			if (r == res)
+				continue;
+			if (r->end > res->end)
+				continue;
+			if (r->end == res->end && r->start > res->start)
+				continue;
+		}
+
+		if (!prev)
+			prev = r;
+
+		/*
+		 * A small resource is higher than a large one that ends at
+		 * the same address.
+		 */
+		if (r->end > prev->end ||
+		    (r->end == prev->end && r->start > prev->start))
+			prev = r;
+	}
+
+	return prev;
+}
+
 /**
  * pci_bus_alloc_resource - allocate a resource from a parent bus
  * @bus: PCI bus
@@ -89,9 +132,10 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
 					  resource_size_t),
 		void *alignf_data)
 {
-	int i, ret = -ENOMEM;
+	int ret = -ENOMEM;
 	struct resource *r;
 	resource_size_t max = -1;
+	unsigned int type = res->flags & IORESOURCE_TYPE_BITS;
 
 	type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
 
@@ -99,10 +143,9 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
 	if (!(res->flags & IORESOURCE_MEM_64))
 		max = PCIBIOS_MAX_MEM_32;
 
-	pci_bus_for_each_resource(bus, r, i) {
-		if (!r)
-			continue;
-
+	/* Look for space at highest addresses first */
+	r = pci_bus_find_resource_prev(bus, type, NULL);
+	for ( ; r; r = pci_bus_find_resource_prev(bus, type, r)) {
 		/* type_mask must match */
 		if ((res->flags ^ r->flags) & type_mask)
 			continue;


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] PCI: allocate bus resources from the top down
  2010-09-15 21:09 ` [PATCH 3/3] PCI: allocate bus resources " Bjorn Helgaas
@ 2010-09-15 21:50   ` H. Peter Anvin
  2010-09-15 22:44     ` Bjorn Helgaas
  0 siblings, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2010-09-15 21:50 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jesse Barnes, Brian Bloniarz, Charles Butterfield, Denys Vlasenko,
	linux-pci, linux-kernel, Stefan Becker, Yinghai Lu,
	Thomas Gleixner, Linus Torvalds, Ingo Molnar

On 09/15/2010 02:09 PM, Bjorn Helgaas wrote:
> 
> On the machine above (a Dell T3500), the [mem 0xbff00000-0xbfffffff] region
> doesn't actually work and is likely a BIOS defect.  The symptom is that we
> move the AHCI controller to 0xbff00000, which leads to "Boot has failed,
> sleeping forever," a BUG in ahci_stop_engine(), or some other boot failure.
> 

Acked-by: H. Peter Anvin <hpa@zytor.com>

... for the patch in general, but I would like to *also* request a DMA
or PCI quirk to explicitly reserve the above range on the affected Dell
machines.

	-hpa

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] PCI: allocate bus resources from the top down
  2010-09-15 21:50   ` H. Peter Anvin
@ 2010-09-15 22:44     ` Bjorn Helgaas
  2010-09-15 23:45       ` H. Peter Anvin
  0 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2010-09-15 22:44 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jesse Barnes, Brian Bloniarz, Charles Butterfield, Denys Vlasenko,
	linux-pci, linux-kernel, Stefan Becker, Yinghai Lu,
	Thomas Gleixner, Linus Torvalds, Ingo Molnar

On Wednesday, September 15, 2010 03:50:14 pm H. Peter Anvin wrote:
> On 09/15/2010 02:09 PM, Bjorn Helgaas wrote:
> > 
> > On the machine above (a Dell T3500), the [mem 0xbff00000-0xbfffffff] region
> > doesn't actually work and is likely a BIOS defect.  The symptom is that we
> > move the AHCI controller to 0xbff00000, which leads to "Boot has failed,
> > sleeping forever," a BUG in ahci_stop_engine(), or some other boot failure.
> > 
> 
> Acked-by: H. Peter Anvin <hpa@zytor.com>
> 
> ... for the patch in general, but I would like to *also* request a DMA
> or PCI quirk to explicitly reserve the above range on the affected Dell
> machines.

I'd like to do that, but I don't see a good way to do it yet.

We saw the problem on a T3500, a T3400, and a T4500, and I'm sure there
are others.  So I don't know how to identify the affected machines.

And I don't know how to identify the invalid ranges, because I suspect
it depends on the memory size.  I think it would be quite unusual for
a window to start 1MB under the nice 256MB boundary, but I'm not sure
I'm ready to say that's always illegal.

On these machines, the [mem 0xbff00000-0xbfffffff] area is actually
reported as reserved in the E820 map, and I first thought we could
simply rely on that.  But I'm not really comfortable with that either,
because I don't think there's a dependable relationship between those
E820 entries and ACPI and PCI devices.  For one thing, I experimented
with Windows, and it happily places PCI devices in reserved areas,
and I think we're likely to trip over more BIOS bugs if we rely on
something Windows doesn't.

I suspect Windows would blow up, too, if we could somehow fill up the
rest of the window and force it to allocate the bottom.  But since
it's only a 1MB area, I think that would be very difficult to do
unless there's some way to tweak PCI BARs before booting Windows.

Bjorn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] PCI: allocate bus resources from the top down
  2010-09-15 22:44     ` Bjorn Helgaas
@ 2010-09-15 23:45       ` H. Peter Anvin
  2010-09-16 17:04         ` Bjorn Helgaas
  0 siblings, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2010-09-15 23:45 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jesse Barnes, Brian Bloniarz, Charles Butterfield, Denys Vlasenko,
	linux-pci, linux-kernel, Stefan Becker, Yinghai Lu,
	Thomas Gleixner, Linus Torvalds, Ingo Molnar

On 09/15/2010 03:44 PM, Bjorn Helgaas wrote:
> 
> I'd like to do that, but I don't see a good way to do it yet.
> 
> We saw the problem on a T3500, a T3400, and a T4500, and I'm sure there
> are others.  So I don't know how to identify the affected machines.
> 
> And I don't know how to identify the invalid ranges, because I suspect
> it depends on the memory size.  I think it would be quite unusual for
> a window to start 1MB under the nice 256MB boundary, but I'm not sure
> I'm ready to say that's always illegal.
> 
> On these machines, the [mem 0xbff00000-0xbfffffff] area is actually
> reported as reserved in the E820 map, and I first thought we could
> simply rely on that.  But I'm not really comfortable with that either,
> because I don't think there's a dependable relationship between those
> E820 entries and ACPI and PCI devices.  For one thing, I experimented
> with Windows, and it happily places PCI devices in reserved areas,
> and I think we're likely to trip over more BIOS bugs if we rely on
> something Windows doesn't.
> 
> I suspect Windows would blow up, too, if we could somehow fill up the
> rest of the window and force it to allocate the bottom.  But since
> it's only a 1MB area, I think that would be very difficult to do
> unless there's some way to tweak PCI BARs before booting Windows.
> 

If we put PCI devices in E820 RESERVED areas that's a bug, plain and
simple.  We should absolutely not doing so!

To some degree I don't care if Windows does or not ... that is the
documented mechanism for reserving address space, and we should respect
that.  Furthermore, we use the same mechanism internally for reserving
address space.

Furthermore, the only potentially bad outcome is reserving excessive
address space, which is in general safer than the opposite.

So let's just fix this.

	-hpa

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] PCI: allocate bus resources from the top down
  2010-09-15 23:45       ` H. Peter Anvin
@ 2010-09-16 17:04         ` Bjorn Helgaas
  2010-09-16 17:44           ` H. Peter Anvin
  0 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2010-09-16 17:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jesse Barnes, Brian Bloniarz, Charles Butterfield, Denys Vlasenko,
	linux-pci, linux-kernel, Stefan Becker, Yinghai Lu,
	Thomas Gleixner, Linus Torvalds, Ingo Molnar

On Wednesday, September 15, 2010 05:45:54 pm H. Peter Anvin wrote:
> On 09/15/2010 03:44 PM, Bjorn Helgaas wrote:
> > 
> > I'd like to do that, but I don't see a good way to do it yet.
> > 
> > We saw the problem on a T3500, a T3400, and a T4500, and I'm sure there
> > are others.  So I don't know how to identify the affected machines.
> > 
> > And I don't know how to identify the invalid ranges, because I suspect
> > it depends on the memory size.  I think it would be quite unusual for
> > a window to start 1MB under the nice 256MB boundary, but I'm not sure
> > I'm ready to say that's always illegal.
> > 
> > On these machines, the [mem 0xbff00000-0xbfffffff] area is actually
> > reported as reserved in the E820 map, and I first thought we could
> > simply rely on that.  But I'm not really comfortable with that either,
> > because I don't think there's a dependable relationship between those
> > E820 entries and ACPI and PCI devices.  For one thing, I experimented
> > with Windows, and it happily places PCI devices in reserved areas,
> > and I think we're likely to trip over more BIOS bugs if we rely on
> > something Windows doesn't.
> > 
> > I suspect Windows would blow up, too, if we could somehow fill up the
> > rest of the window and force it to allocate the bottom.  But since
> > it's only a 1MB area, I think that would be very difficult to do
> > unless there's some way to tweak PCI BARs before booting Windows.
> 
> If we put PCI devices in E820 RESERVED areas that's a bug, plain and
> simple.  We should absolutely not doing so!
> 
> To some degree I don't care if Windows does or not ... that is the
> documented mechanism for reserving address space, and we should respect
> that.  Furthermore, we use the same mechanism internally for reserving
> address space.

It does seem like we should do *something* with E820 reserved areas, but
I'm not 100% convinced we should be more strict than Windows.  If we
pay attention to things Windows doesn't test, I think we're likely to
trip over even more BIOS bugs.

Linux does avoid putting PCI devices in E820 reserved areas ... in
some cases.  In this Dell case, the reserved area conflicts with a
host bridge window, so we expand the reserved area and insert it as
the *parent* of the window.  Since it's the parent, it has no effect
on allocations from the window, so we end up putting devices in the
reserved area.

I think the problem is that E820 reservations fundamentally don't
fit well with the Linux resource manager.  We manage resources as
a strict hierarchy of non-overlapping regions, but there's no
requirement that E820 reservations have any relationship with actual
devices that we discover via ACPI, PCI, etc.

We've been kludging around this with a collection of hacks like
reserve_region_with_split() and insert_resource_expand_to_fit(),
but I think we're just making an unmaintainable mess.  We should
take a step back and think about how to do this cleanly.

Bjorn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] PCI: allocate bus resources from the top down
  2010-09-16 17:04         ` Bjorn Helgaas
@ 2010-09-16 17:44           ` H. Peter Anvin
  2010-09-16 19:33             ` Yinghai Lu
  0 siblings, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2010-09-16 17:44 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jesse Barnes, Brian Bloniarz, Charles Butterfield, Denys Vlasenko,
	linux-pci, linux-kernel, Stefan Becker, Yinghai Lu,
	Thomas Gleixner, Linus Torvalds, Ingo Molnar

On 09/16/2010 10:04 AM, Bjorn Helgaas wrote:
> It does seem like we should do *something* with E820 reserved areas, but
> I'm not 100% convinced we should be more strict than Windows.  If we
> pay attention to things Windows doesn't test, I think we're likely to
> trip over even more BIOS bugs.

Windows have a couple of advantages on us.  They have WHQL; every
machine needs to pass WHQL or it doesn't get sold.  The other advantage
is that most manufacturers of Windows desktops don't give a damn about
anything other than the shipping configuration (Windows version and
hardware): I have seen machines which fail to boot if you put in a PCI
UART card.

> Linux does avoid putting PCI devices in E820 reserved areas ... in
> some cases.  In this Dell case, the reserved area conflicts with a
> host bridge window, so we expand the reserved area and insert it as
> the *parent* of the window.  Since it's the parent, it has no effect
> on allocations from the window, so we end up putting devices in the
> reserved area.

OK, so that's a problem.  This isn't really a hideously uncommon use
case for a reserved region, either: it probably reflects a device used
by SMM under that particular host bridge.

> I think the problem is that E820 reservations fundamentally don't
> fit well with the Linux resource manager.  We manage resources as
> a strict hierarchy of non-overlapping regions, but there's no
> requirement that E820 reservations have any relationship with actual
> devices that we discover via ACPI, PCI, etc.
> 
> We've been kludging around this with a collection of hacks like
> reserve_region_with_split() and insert_resource_expand_to_fit(),
> but I think we're just making an unmaintainable mess.  We should
> take a step back and think about how to do this cleanly.

Perhaps we should consider reserved regions a separate hierarchy, or we
should deal with them at the time a new resource is created?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] PCI: allocate bus resources from the top down
  2010-09-16 17:44           ` H. Peter Anvin
@ 2010-09-16 19:33             ` Yinghai Lu
  2010-09-16 20:02               ` H. Peter Anvin
  0 siblings, 1 reply; 10+ messages in thread
From: Yinghai Lu @ 2010-09-16 19:33 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Bjorn Helgaas, Jesse Barnes, Brian Bloniarz, Charles Butterfield,
	Denys Vlasenko, linux-pci, linux-kernel, Stefan Becker,
	Thomas Gleixner, Linus Torvalds, Ingo Molnar

On 09/16/2010 10:44 AM, H. Peter Anvin wrote:
> On 09/16/2010 10:04 AM, Bjorn Helgaas wrote:
>> It does seem like we should do *something* with E820 reserved areas, but
>> I'm not 100% convinced we should be more strict than Windows.  If we
>> pay attention to things Windows doesn't test, I think we're likely to
>> trip over even more BIOS bugs.
> 
> Windows have a couple of advantages on us.  They have WHQL; every
> machine needs to pass WHQL or it doesn't get sold.  The other advantage
> is that most manufacturers of Windows desktops don't give a damn about
> anything other than the shipping configuration (Windows version and
> hardware): I have seen machines which fail to boot if you put in a PCI
> UART card.
> 
>> Linux does avoid putting PCI devices in E820 reserved areas ... in
>> some cases.  In this Dell case, the reserved area conflicts with a
>> host bridge window, so we expand the reserved area and insert it as
>> the *parent* of the window.  Since it's the parent, it has no effect
>> on allocations from the window, so we end up putting devices in the
>> reserved area.
> 
> OK, so that's a problem.  This isn't really a hideously uncommon use
> case for a reserved region, either: it probably reflects a device used
> by SMM under that particular host bridge.
> 
>> I think the problem is that E820 reservations fundamentally don't
>> fit well with the Linux resource manager.  We manage resources as
>> a strict hierarchy of non-overlapping regions, but there's no
>> requirement that E820 reservations have any relationship with actual
>> devices that we discover via ACPI, PCI, etc.
>>
>> We've been kludging around this with a collection of hacks like
>> reserve_region_with_split() and insert_resource_expand_to_fit(),
>> but I think we're just making an unmaintainable mess.  We should
>> take a step back and think about how to do this cleanly.
> 
> Perhaps we should consider reserved regions a separate hierarchy, or we
> should deal with them at the time a new resource is created?

Those Dell machine BIOS set wrong value in the pci bridge. and it is partial
overlapped with reserved range in e820.

Kernel current will honor that HW setting from pci bridge reading at first.

so the best way should be: using e820 reserved range to trim reading from pci or acpi in some cases.

Yinghai




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] PCI: allocate bus resources from the top down
  2010-09-16 19:33             ` Yinghai Lu
@ 2010-09-16 20:02               ` H. Peter Anvin
  0 siblings, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2010-09-16 20:02 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Jesse Barnes, Brian Bloniarz, Charles Butterfield,
	Denys Vlasenko, linux-pci, linux-kernel, Stefan Becker,
	Thomas Gleixner, Linus Torvalds, Ingo Molnar

On 09/16/2010 12:33 PM, Yinghai Lu wrote:
> 
> Those Dell machine BIOS set wrong value in the pci bridge. and it is partial
> overlapped with reserved range in e820.
> 
> Kernel current will honor that HW setting from pci bridge reading at first.
> 
> so the best way should be: using e820 reserved range to trim reading from pci or acpi in some cases.
> 

Yes, that's the only sane option.

	-hpa

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-09-16 20:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20100915210818.12365.58732.stgit@bob.kio>
2010-09-15 21:09 ` [PATCH 1/3] x86/PCI: allocate space from the end of a region, not the beginning Bjorn Helgaas
2010-09-15 21:09 ` [PATCH 2/3] resources: allocate space within a region from the top down Bjorn Helgaas
2010-09-15 21:09 ` [PATCH 3/3] PCI: allocate bus resources " Bjorn Helgaas
2010-09-15 21:50   ` H. Peter Anvin
2010-09-15 22:44     ` Bjorn Helgaas
2010-09-15 23:45       ` H. Peter Anvin
2010-09-16 17:04         ` Bjorn Helgaas
2010-09-16 17:44           ` H. Peter Anvin
2010-09-16 19:33             ` Yinghai Lu
2010-09-16 20:02               ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).